View
0
Download
0
Category
Preview:
Citation preview
ANY OPINIONS EXPRESSED ARE THOSE OF THE AUTHOR(S) AND NOT NECESSARILY THOSE OF THE SCHOOL OF ECONOMICS & SOCIAL SCIENCES, SMU
O
The WTO Trade Effect
Pao-li Chang, Myoung-jae Lee September 2007
Paper No. 06-2007
The WTO Trade Effect
Pao-Li Chang∗
School of Economics
Singapore Management University
Singapore 178903
Myoung-Jae Lee†
Department of Economics
Korea University
Seoul, Korea
September 23, 2007
Abstract
Rose (2004) showed that the WTO or its predecessor, the GATT, did not promote
trade, based on conventional econometric analysis of gravity-type equations of trade. We
argue that conclusions regarding the GATT/WTO trade effect based on gravity-type
equations are arbitrary and subject to parametric misspecifications. We propose us-
ing nonparametric matching methods to estimate the ‘treatment effect’ of GATT/WTO
membership, and permutation-based inferential procedures for assessing statistical sig-
nificance of the estimated effects. A sensitivity analysis following Rosenbaum (2002)
is then used to evaluate the sensitivity of our estimation results to potential selection
biases. Contrary to Rose (2004), we find the effect of GATT/WTO membership econom-
ically and statistically significant, and far greater than that of the Generalized System of
Preferences (GSP).
Keywords: GATT/WTO; GSP; treatment effect; matching; permutation test; signed-
rank test; sensitivity analysis
JEL classification: F13; F14; C14; C21; C23
∗Tel: +65-6828-0830; fax: +65-6828-0833. E-mail address: plchang@smu.edu.sg (P.-L. Chang).†Tel: +85-2-3163-4300; E-mail address: myoungjae@korea.ac.kr
1
1. INTRODUCTION
In the post war era, world merchandize trade volume grew at an exponential rate and at a rate
much higher than world merchandize output: while exports volume grew at an annual rate
of 6.2% during 1950-2005, production grew at only 3.8% (WTO, 2006, p. 27). This trend of
expanded trade concurred with the development of the multilateral trade institution. Since
the coming into effect of the General Agreement on Tariffs and Trade (GATT) in 1947,
the framework of trade agreements regulating trade policies has grown in coverage and in
depth over the decades, culminating with the establishment of the World Trade Organization
(WTO) in 1995. The number of countries choosing to join the GATT/WTO also saw a steady
increase from the original 23 contracting parties to the current 151 members. By contrasting
the post-war phenomenon with the pre-war experience in the late 1920’s to 1930’s during
which the world trade volume fell into a downward spiral with uncontrolled tariff retaliations
among countries, it is natural for one to conclude that the GATT/WTO likely has spurred
the integration and expansion of world trade.
On theoretical grounds, several economic theories have also been proposed that explain
the mechanism of multilateral trade agreements in promoting trade. Major explanations
include the terms-of-trade argument and the political-commitment argument. On one hand,
multilateral trade agreements may help coordinate countries’ trade policies and prevent them
from engaging in the terms-of-trade-driven tariff retaliations which decrease trade volumes
(Johnson, 1953–1954; Bagwell and Staiger, 1999, 2001). On the other hand, multilateral
trade agreements may also help national governments to commit to liberalizing trade policies,
given the retaliation threat from other countries if the commitment is not carried out. This
enhances policy credibility with respect to domestic private sectors and brings about efficient
production and trade structure (Staiger and Tabellini, 1987, 1989, 1999). Other potential
mechanisms of GATT/WTO in encouraging trade often cited by economists or policymakers
also include the reduced uncertainty of member countries’ trade policies and overall business
environment, facilitated by binding tariffs or the GATT/WTO dispute settlement mechanism
(Evenett and Braga, 2005).
In view of the empirical observations and theoretical validation, it is thus surprising that
some recent studies showed that the GATT/WTO did not appear to have promoted trade.
Perhaps the most important and widely cited study is Rose (2004). The study uses a large
2
panel of data on bilateral trade for 178 trading entities during year 1948 to 1999, and conducts
conventional econometric analysis based on the gravity model of trade (which postulates that
bilateral trade depends log-linearly on the distance between two countries and their GDP’s,
and other determinants possibly affecting bilateral trade). The mechanism of GATT/WTO
in influencing trade is incorporated in the framework by two dummy variables, which indicate
respectively whether or not both countries are GATT/WTO members and whether or not
only one country is a GATT/WTO member. The coefficients on both dummy variables were
found in Rose (2004) to be insignificantly different from zeros, in the benchmark model with
ordinary-least-squares (OLS) regression, as well as in most subsequent sensitivity analysis
with variations in estimation methods and in the data.
Gravity-type models of trade have in recent years become popular in empirical studies
of bilateral trade pattern. Although the basic formulation can be justified by theories as in
Anderson (1979), Helpman and Krugman (1985), Deardorff (1998), and Anderson and van
Wincoop (2003), the augmented version of gravity equation adopted in Rose (2004) (and
in many other empirical studies) is often not. Various ad hoc regressors in addition to the
basic gravity variables (the distance and the GDP’s) are often added to the equation to
suit the purpose of the study without rigorous theoretical justifications. For example, the
survey by Oguledo and MacPhee (1994) cited 49 explanatory variables that had been used
in gravity-type empirical studies. Although the extra variables typically used to measure the
degree of trade frictions across countries (such as language barrier and colonial relationship)
might well have an impact on bilateral trade flows, it is not clear in what functional form
they would enter the basic gravity equation and how they would interact with one another.
To illustrate this point, we show in the main text that the main gravity equation of Rose
(2004) is misspecified by omitting higher-order interaction terms among regressors. Given
this, it is possible to construct richer parametric forms of specifications and find favorable
evidences of GATT/WTO’s trade-creating effect. We provide one such example in the main
text. However, without theoretical guidance, an investigation of this kind looks endless and
any conclusion drawn would seem arbitrary.
In this paper, we propose using nonparametric ‘matching’ methods to estimate the trade
effect of GATT/WTO membership. Nonparametric estimation does not assume knowledge
of the underlying true model and avoids potential biases stemming from parametric misspec-
ifications. If the gravity equation specified in Rose (2004) is indeed the correct underlying
3
model, using a nonparametric approach should lead to similar estimation results. Among
many possible nonparametric approaches to evaluating the ‘treatment’ effect of a policy or
program, matching method is relatively straightforward to implement. Matching is widely
used in the fields of labor and health economics. See, for example, Heckman et al. (1997),
Imbens (2004), and Lee (2005), and applications in Heckman et al. (1998), Lechner (2000),
Lee et al. (2007), and Lu et al. (2001).
Applying the terminology in the matching literature to our current context, the state of
both countries being GATT/WTO members is a ‘treatment’, and the country-couple in the
state is a ‘treated’ subject, in contrast with a ‘control’ subject where the two countries are
nonmembers. In essence, a pair-matching method compares the ‘responses’ (trade flows) of a
treated subject and a control subject who differ in their treatments but are otherwise similar
in every aspect likely to influence the bilateral trade flows. The difference in their bilateral
trade flows is then attributed to the treatment effect of both countries being GATT/WTO
members. Similar procedures apply if we want to estimate the effect of other treatments
such as only one country in a country-couple being a GATT/WTO member versus both of
them nonmembers. In finding the best match from the control group for a treated subject,
a list of covariates need to be identified in terms of which the similarity between any two
subjects is measured. For this, we use the same set of covariates as in Rose (2004). By using
the matching method, however, we do not assume any particular functional form that relates
these covariates to bilateral trade.
To assess the statistical significance of the treatment effect estimated by the matching
method, we propose using the permutation test. Although matching estimators are popular
in practice, asymptotic theories on their large sample properties are not fully established for
most data generating processes. See Abadie and Imbens (2006) for a large sample theory of
the matching estimator for independent observations. In practice, a standard t-statistic or a
bootstrap procedure is often used to derive the p-value or confidence intervals. Standard t-
statistic is straightforward but without theoretical justifications; bootstrap is computationally
demanding and argued by Abadie and Imbens (2006) to be invalid. In contrast with the above
asymptotics-based tests, the permutation test is an exact inferential procedure conditional on
the observed data. Under the null hypothesis of no treatment effect, two subjects in a matched
pair are ‘exchangeable’ in the labeling of their treatment status (treated or untreated). By
obtaining all possible permutations of the treatment labels in all pairs, the exact p-value of
4
the observed matching estimator can be computed by placing it in the distribution of the
permutations. See Pesarin (2001) and Good (2004) for an introduction to the concept, and
Ernst (2004) for a survey of the permutation method in general contexts. Recently, Imbens
and Rosenbaum (2005) showed that confidence intervals constructed by the permutation
method are far more reliable than other confidence intervals.
In applying the matching estimator and the permutation test, there are two potential
shortcomings. First, the matching estimator is susceptible to outliers (as OLS is). Second,
the permutation test is only ‘conditionally distribution-free’ — conditional on the sample at
hand; thus its finding is only valid within the sample. To provide more robust findings, we
also conduct the Wilcoxon (1945) signed-rank test version of the matching estimator and the
permutation test. Instead of the actual numerical differences, the ranks of the differences are
used in determining the distribution of the signed-rank test statistic across all permutations.
In addition to being robust to outliers, the signed-rank test turns out to be distribution-free,
so that findings based on this test are also valid out of sample.
Some country-couples may be more likely to join the GATT/WTO and also to experience
higher bilateral trade volumes, leading to a potential upward bias in the treatment effect
estimator. This selection-bias problem is relevant in both the parametric setting of Rose
(2004) and the nonparametric setting of ours. Selection on observable variables does not pose
a selection-bias problem. If, however, the selection is made based on unobservables that affect
both trade flows and the decision to join the GATT/WTO, then there is a selection problem.
We conduct a sensitivity analysis following Rosenbaum (2002) to evaluate the sensitivity of
our conclusions obtained under the assumption of no selection bias. This method examines
how severe the unobserved selection problem must be to overturn the original finding. A
finding is deemed robust if it takes a substantial difference between the treated and the
untreated in their odds of taking treatment (after controlling for the observable variables) to
overturn the finding. The use of similar sensitivity analysis is slowly increasing in numbers.
See, for example, Aakvik (2001), Imbens (2003), Hujer et al. (2004), Altonji et al. (2005),
and Lee et al. (2007), among others.
As another way to deal with the selection problem, we also apply the matching procedure
introduced above in restricted manners, where the potential matches for a treated subject is
limited to a certain subset of controls. They are: matching restricted to the observations of
the same country-couple, matching restricted to the observations in the same year, matching
5
restricted to the observations in the same GATT/WTO period, and matching restricted to
the observations of the same income-class combination. For example, there may be some
unobserved country-couple specific characteristics that affect both their GATT/WTO mem-
berships and their bilateral trade flows, and the characteristics may be systematically different
across country-couples. In this case, matching without restriction leads to a biased estimate
that contains the true treatment effect and the effect arising from the difference in unobserved
country-couple characteristics if the pair in a match are of different country-couples. The
bias is eliminated by restricting matching to the observations of the same country-couple.
Our nonparametric matching method and permutation-based inference procedures led to
findings opposite to Rose’s (2004): in short, we found economically and statistically significant
trade-promoting effect of the multilateral trade institution. The bilateral trade between
two countries is higher by 193% for country-couples who are both GATT/WTO members,
relative to their counterparts who are both nonmembers. The effect is similar if estimation
is restricted to cross-section variation (in the case of matching within the same year or
the same GATT/WTO period). On the other hand, when estimation is restricted to time-
series variation (in the case of matching within the same country-couple), the bilateral trade
increases by a factor of 126% when a country-couple are both GATT/WTO members, relative
to when they both are not. Both estimates are robust to a reasonable degree of potential
selection biases. In his findings, Rose (2004) also suggested that the Generalized System of
Preferences (GSP) had a significant and larger effect than GATT/WTO. In contrast, GSP is
found in our analysis to raise bilateral trade volume, but by a factor smaller than the effect
of ‘both in GATT/WTO’, regardless of the matching criteria. Based on matching restricted
to country-couples of the same income-class combination, bilateral trade is estimated to be
higher by a factor of 73% for a trading relationship that extends the GSP scheme, relative
to its counterpart that does not.
The findings of Rose (2004) were also challenged by two other studies. On one hand,
Tomz et al. (2005) highlight the effective participation of nonmember participants in the
GATT/WTO system. These include colonies, de facto members, and provisional members,
who enjoy to a large extent the same set of rights and obligations under the agreement as
formal members. By simply re-classifying these countries as de facto GATT/WTO members
without changing the estimation framework of Rose (2004), Tomz et al. (2005) find that
GATT/WTO substantially increases trade and that its effects are relatively stable across
6
countries and over time. Subramanian and Wei (2007), on the other hand, emphasize asym-
metries of GATT/WTO trade effects across different sub-samples—developed versus develop-
ing countries, old versus new developing countries, and generally unprotected sectors versus
protected sectors. They find that the GATT/WTO institution promotes trade when and
where the multilateral trade agreements are in full force. In contrast with these two papers,
we do not attempt to refine the data of Rose (2004) or to differentiate the trade effects among
sub-samples. Instead, our paper focuses on the improvement of estimation and inferential
methodologies as introduced above.
The rest of this paper is organized as follows. In Section 2, we highlight possible mis-
specifications of the Rose (2004) gravity equation and arbitrariness of conclusions derived
from the parametric framework regarding the GATT/WTO effect. We then introduce the
nonparametric matching methodology and permutation-based inference procedures in Sec-
tion 3. Section 4 reviews the data set of Rose (2004). Our estimation results and findings
are reported in Section 5. Section 6 concludes.
2. MISSPECIFICATION OF THE GRAVITY EQUATION
In this section, we illustrate the potential misspecification of the main gravity equation used
in Rose (2004) and the arbitrariness of conclusions derived based on variations of the model.
The benchmark result of Rose (2004) based on an OLS regression of a gravity-type equation
is replicated in Column (I) of Table 1.1 As shown, the coefficients of the two GATT/WTO
dummy variables (Both in and One in) are insignificantly different from zeros.
In Column (II), we run a misspecification test of the gravity equation (I). For this, we
apply the regression equation specification error test (RESET) of Ramsey (1969), which
tests whether non-linear combinations of the regressors have any power in explaining the
regressand. If yes, then the original model is misspecified. The test is done by augmenting
the original regression equation with the square or higher powers of the predicted value of the
regressand from the original regression and testing the significance of the coefficients of these
augmented terms. As indicated in Column (II), the coefficient of the squared predicted value
y2 of the log real trade derived from the gravity equation (I) is significant. This indicates that
the original gravity equation is misspecified by omitting interaction terms among regressors1The data set used is Rose’s (2004) downloaded from his website. More details on the data set are discussed
in Section 4.
7
or squares of regressors, because y2 is a sum of these terms.
Given the finding of the RESET test, we experiment with augmenting the gravity equa-
tion (I) with interaction terms among the two GATT/WTO dummy variables and the other
regressors, and rerun the regression. The results in Column (III) show that many interaction
terms are significant, illustrating clearly that the benchmark model of Rose (2004) is mis-
specified.2 With the interaction terms in, the effects of GATT/WTO membership for trading
partners i and j in year t are,
{Both in effect}ijt = β{Both in} + β{Both in x GSP}{GSP}ijt + . . .
+ β{Both in x Ever colony}{Ever colony}ijt;
{One in effect}ijt = β{One in} + β{One in x GSP}{GSP}ijt + . . .
+ β{One in x Common colonizer}{Common colonizer}ijt
+ β{One in x Ever colony}{Ever colony}ijt,
where β is the coefficient of the corresponding regressor or interaction term. These effects
vary across trading partners and years ijt, and Table 2 shows some quantiles of these effects.
As shown in Table 2, the mean effect of GATT/WTO membership for the whole sample based
on the specification in (III), obtained by replacing the regressors with the sample means and
β with the estimate, is significant and positive.
Of course, the specification in (III) is just one of many possibilities to allow higher-
order parametric specifications of bilateral trade. One can conjecture that possibly richer
parametric forms of specifications may lead to even more favorable evidence of GATT/WTO
effect. However, without theoretical guidance, an investigation of this kind looks endless and
any conclusion drawn would seem arbitrary.
In the following, we resort to the nonparametric matching approach to the estimation of
the GATT/WTO trade effect. The approach requires identification of the covariates that are
likely to have affected bilateral trade, for which we will adopt the same set of covariates as
in Rose (2004). However, beyond that, no previous knowledge about the underlying model
structure is required: estimation and inference are derived without having to specify the2The null hypothesis that all interaction terms in parametric specification (III) are irrelevant can be easily
rejected by the χ2-test: [(R2a−R2
0)/(1−R2a)]∗(N−k), where R2
a (respectively R20) is the R2 under specification
(III) (respectively (I)), N is the sample size, and k is the number of regressors under specification (III). Thetiny difference in R2 between parametric specifications (I) and (III) is magnified by N − k; as the currentsample size is huge, we get significant interaction terms in spite of little change in the goodness-of-fit.
8
functional form that relates the bilateral trade pattern to the covariates; thus, all forms
of higher-order or interaction terms of the covariates are accommodated. This avoids the
often large variation in estimates across different functional forms and parametric modeling
assumptions, and as a result, the biases stemming from model misspecifications.
3. METHODOLOGY
Suppose there are N countries in the world. With the countries indexed by i = 1, ..., N , let
yijt be the logarithm of the real bilateral trade volume between countries i and j in year
t. Define xijt as a vector of covariates, which includes mainly the characteristics of country
couple (i, j) in year t. For country couple (i, j), two effects are of interest:
• the effect when both countries are in the GATT/WTO, relative to when none is in.
• the effect when only one country is in the GATT/WTO, relative to when none is in.
We will also investigate the effect of GSP on bilateral trade, which is:
• the effect when one country is a GSP beneficiary of the other country or vice versa,
relative to the scenario of no GSP.
To facilitate exposition, focus on both-in versus none-in, and define
dijt = 1 if both in and 0 if none in.
The analysis for the effect of one-in or GSP is analogous. Thus, the data consists of
zijt = (dijt, x′ijt, yijt)′, i = 1, . . . , N , j = i + 1, . . . , N , and t = 1, . . . , T . In the terminology of
the matching literature, yijt is a response variable, the observations with dijt = 1 constitute
the treatment group, and those with dijt = 0 the control group. Note that we will use the
word ‘couple’ (an observation unit which comprises two countries between which trade volume
is concerned) and ‘pair’ (two observation units which are considered a match) separately to
avoid confusion with the matching literature.
9
3.1 Mean Effects and Matching
Define two ‘potential’ response variables:
y1ijt : potential treated response for couple (i, j) in year t,
y0ijt : potential untreated response for couple (i, j) in year t.
We will denote ‘couple (i, j) in year t’ simply ’subject ijt’ from now on. The subject-ijt
treatment effect is y1ijt − y0
ijt, which is, however, not identified. Instead, usually one tries
to estimate the mean effect E(y1ijt − y0
ijt). Omitting the subscripts to simplify exposition,
E(y1 − y0) is identified by the group mean difference
E(y|d = 1)− E(y|d = 0) = E(y1|d = 1)−E(y0|d = 0) = E(y1 − y0) if (y0, y1)q d
where ‘q’ stands for statistical independence. If we have only y0 q d, then
E(y|d = 1)− E(y|d = 0) = E(y1|d = 1)− E(y0|d = 0) = E(y1|d = 1)−E(y0|d = 1)
= E(y1 − y0|d = 1) which is the ‘effect on the treated ’.
Analogously, if y1 q d, then
E(y|d = 1)− E(y|d = 0) = E(y1 − y0|d = 0), which is the ‘effect on the untreated ’.
Since y0 is the control response, in general, y0 q d is more plausible than y1 q d.
For the GATT/WTO effect, both the effect on the treated (i.e., those who chose to join
GATT/WTO) as well as the effect on the untreated (i.e., those who chose not to join) are of
interest. Since
E(y1 − y0) = E(y1 − y0|d = 0)P (d = 0) + E(y1 − y0|d = 1)P (d = 1),
if the effect on the treated is the same as the effect on the untreated, then both equal
E(y1−y0). Otherwise, the representative effect E(y1−y0) is a weighted average of the effect
on the treated and the effect on the untreated.
In observational data, the treatment and control groups usually differ in observed (xijt) or
10
unobserved variables (εijt). Matching removes the observed difference (and the unobserved
difference to the extent the two differences are related). In terms of equations, matching is
expressed by including x in the conditioning set in the preceding equations. The above group
mean difference equation becomes
E(y|d = 1, x)−E(y|d = 0, x) = E(y1|d = 1, x)−E(y0|d = 0, x) = E(y1−y0|x) if (y0, y1)qd|x,
and the E(y1 − y0)-decomposition equation becomes
E(y1 − y0|x) = E(y1 − y0|d = 0, x)P (d = 0|x) + E(y1 − y0|d = 1, x)P (d = 1|x).
Once the x-conditional effect is found, x can be integrated out to yield a marginal effect;
here, the integration can be done with different integrators (i.e., weighting functions). For
the effect on the treated, the distribution of x|d = 1 is typically used to render
E(y1 − y0|d = 1) =∫
E(y1 − y0|d = 1, x)dF (x|d = 1)
where F (·|d = 1) denotes the distribution of x|d = 1.
The assumption (y0, y1)qd|x states that y0 and y1 may be related to d but only through
x. This is nothing but a ‘no-selection bias’ assumption, which may be called ‘randomization
of d given x’. In the current context of bilateral trade, this assumption implies that a
treated subject and an untreated subject be comparable in terms of their potential trade
performance and likelihood of joining the GATT/WTO, if they exhibit the same observable
characteristics x. Thus, the observed untreated response of the untreated subject can be used
as a proxy for the counterfactual potential untreated response of the treated subject, and the
observed treated response of the treated subject a proxy for the counterfactual potential
treated response of the untreated subject.
If we are only concerned with the effect on the treated, E(y1 − y0|d = 1), the required
assumption, y0 q d|x, is weaker and requires only that a treated subject and an untreated
subject be comparable in terms of their potential untreated trade performance and likelihood
of joining the GATT/WTO, if they exhibit the same observable characteristics x. Again,
this implies that the observed untreated response of the untreated subject can be used as a
proxy for the counterfactual potential untreated response of the treated subject. The mean
11
effect on the treated can then be estimated based on the subset of the sample who receive the
treatment. The opposite is true if we are only concerned with the effect on the untreated.
Thus, the assumption for matching to work in essence requires that there be no systematic
unobserved difference across the treatment and control group that affects their potential
(treated or untreated or both) trade volumes and GATT/WTO member status. If there is
an unobserved variable ε that affects both d and (y0, y1), then there is a selection problem
and ε causes a bias—called ‘hidden bias’. Exactly for the unknown nature of the treatment
effect and its unknown dependence on the unobserved characteristics, in general, it is more
plausible for the assumption y0 q d|x to hold than y1 q d|x. Thus, in the empirical section,
we will place more emphasis on the estimation result of the effect on the treated than either
the effect on the untreated or the mean effect for the whole sample. While any effect of x
on (y0, y1) and d is dealt with by matching, there is no good way to deal with ε, for it is
unobserved. Nevertheless, a sensitivity analysis can be conducted to account for the presence
of ε and the extent it influences d and (y0, y1) as to be shown later.
So far, we introduced the basics of the treatment effect framework and matching concept.
In practice, matching for the effect on the treated proceeds in the following steps (the effect
on the untreated can be handled analogously). First, a treated unit, say unit ijt, is selected.
Second, control units are selected who are the closest to the treated unit ijt in terms of x.
If only one unit is selected, we get a ‘pair-matching’; otherwise, multiple-matching, in which
the number of matched controls may be allowed to be random or fixed. Third, assuming pair-
matching, suppose there are M pairs (where M corresponds to the number of the successfully
matched treated subjects), and ym1 and ym2 are the responses of the two subjects in pair
m ordered such that ym1 > ym2 without loss of generality; drop any pair with ym1 = ym2.
Then, defining
sm = 1 if the first subject in pair m is treated and − 1 otherwise,
the effect on the treated can be estimated with a pair-matching estimator
D ≡ 1M
M∑
m=1
sm(ym1 − ym2) →p E(y1 − y0|d = 1) under y0 q d|x,
which is simply the average of the difference between the treated response and the untreated
12
response across the M pairs of match.
Some remarks are in order. First, the matching scheme can be reversed to result in an
estimator for the effect on the untreated: a control unit is selected first and then a matched
unit from the treatment group later. Second, the distance ‖xijt − xi′j′t′‖ between two units
ijt and i′j′t′ in terms of x should be chosen; e.g., a scale-normalized distance between xijt
and xi′j′t′ , or alternatively, (xijt − xi′j′t′)′X−1
N (xijt − xi′j′t′) can be used, where XN is the
sample variance matrix in the pooled sample. Third, for a treated unit, if there is no good
matched control, then the unit may be passed over; i.e., a ‘caliper’ c may be set such that a
treated unit with
mini′j′t′∈C
‖xijt − xi′j′t′‖ > c, where ‘i′j′t′ ∈ C’ means unit i′j′t′ in the control group C,
is discarded. The number of matched pairs M used in the matching estimator D decreases
accordingly. For more discussions on treatment effect and matching in general, see, for
example, Rosenbaum (2002) and Lee (2005).
3.2 Permutation Test for Matched Pairs
Suppose that matching has been done resulting in M pairs, and that the two subjects are
exchangeable under the null hypothesis H0 of no effect: the joint distribution of the two
responses in each matched pair does not change when the two responses are switched. That is,
exchangeability is taken as the definition of ‘no effect’, which is weaker than the subject-wise
no-effect concept y1ijt−y0
ijt = 0, but stronger than the mean zero-effect concept E(y1ijt−y0
ijt) =
0. Exchangeability implies that, given the data, all 2M possibilities to permute two responses
in each pair are all equally likely with probability 2−M in the ‘permutation sample space’
with the 2M elements (equivalently, if they are not equally likely, then exchangeability does
not hold). Here, permuting two responses is equivalent to assigning one response to the
treatment group and the other to the control group.
Permutation inference is an exact inferential procedure conditional on all observed data.
Whether H0 is rejected or not depends on how extreme D is—i.e., the p-value of D. In
theory, this can be computed exactly as
12M
2M∑
k=1
1[Dk > D] if the H0-rejection region is in the upper tail
13
where Dk is a “D-like” effect estimator under permutation k. In practice, however, finding
the p-value this way tends to be too time-consuming. Instead, we will apply the following
sequence of approximations.
Generate M iid random variables wm, m = 1, ..., M , with P (wm = 1) = P (wm = −1) =
0.5. If wm = −1, the two responses in pair m are switched (the treated response assigned to
the control group, and the untreated response to the treatment group); otherwise, the two
responses in pair m are not switched. For pseudo sample k obtained this way, obtain the
pseudo effect estimator
D′k ≡
1M
M∑
m=1
wmsm(ym1 − ym2)
where only wm’s are random. The p-value of D is then
1K
K∑
k=1
1[D′k > D] if the H0-rejection region is in the upper tail,
where K is a number much smaller than 2M , say K = 1000. In principle, a pseudo sample
should not be repeated: ‘sampling from the 2M possibilities without replacement’ is required.
In practice, the chance of a pseudo sample being repeated in K (= 1000) pseudo samples is
negligible and can be ignored.3
A further simplification is possible by applying the central limit theorem (CLT) to wm’s.
Observe
E(D′k) = 0 and V (D′
k) = E(D′2k ) =
1M2
M∑
m=1
E{w2ms2
m(ym1−ym2)2} =∑M
m=1(ym1 − ym2)2
M2.
When M is large, the normally approximated p-value of D is
P (D′k > D) = P{ D′
k
{∑Mm=1(ym1 − ym2)2/M2}1/2
>D
{∑Mm=1(ym1 − ym2)2/M2}1/2
}
= P{N(0, 1) >D
{∑Mm=1(ym1 − ym2)2/M2}1/2
},
if the H0-rejection region is in the upper tail. Using this normal approximation, one does not
even have to simulate wm’s.
As is well known, we can obtain a confidence interval (CI) by “inverting” the test pro-3The proof is available from the authors upon request.
14
cedure. For instance, suppose that the treatment effect is the same for all pairs: the effect
increases the treated response by a constant, say β. In this case, replace ym1 with ym1 − β
when sm = 1 or ym2 with ym2 − β when sm = −1 to obtain
Dβ ≡ 1M
M∑
m=1
sm(ym1 − smβ − ym2)
and restore the no-effect situation. Define accordingly
D′β ≡
1M
M∑
m=1
wmsm(ym1 − smβ − ym2)
to observe
E(D′β) = 0 and V (D′
β) = E(D′2β ) =
∑Mm=1(ym1 − smβ − ym2)2
M2.
Now conduct level-α tests with
Dβ
{∑Mm=1(ym1 − smβ − ym2)2/M2}1/2
as β varies. The collection of β values that are not rejected is the (1 − α)100% confidence
interval for β.
See Martiz (1995), Hollander and Wolfe (1999), and Lehmann and Romano (2005) among
many others, for more on permutation (or randomization) tests in general. The use of
permutation-based tests, in stead of asymptotic tests, is especially convenient in the cur-
rent context with a panel of bilateral trade data, which possibly have a complicated data
structure with serial and spatial dependence, rendering the derivation of the asymptotic dis-
tribution of the matching estimator difficult. Based on the principle of exchangeability, the
permutation test requires the joint distribution of the treated and untreated responses to
remain the same despite the permutation. If the joint distribution is normal, this holds if
the x-conditional variances of the two responses are the same under the null of no effect.
This allows for any form of correlation between a treated response and an untreated response
in any matched pair and across matched pairs, and any form of heteroskedasticity across
matched pairs, thus accommodating a wide range of data structure.
15
3.3 Signed-Rank Test for Matched Pairs
The preceding permutation test has two potential shortcomings. One is its susceptibility to
outliers in x and y. The other is that the test is only ‘conditionally distribution-free’—the
permutation distribution under the null hypothesis is conditional on (x′, y, d), which enters
the variance V (D′k). Due to this conditioning, any finding from the test is applicable only to
the sample at hand. That is, the finding has ‘internal validity’, but not ‘external validity’.
If a test is distribution-free, then the finding holds unconditionally as well, which accords
external validity. A well known test that is robust and distribution-free is the Wilcoxon
(1945) signed-rank test. Applying this test to the current matching context, rank |ym1−ym2|,m = 1, ...,M , to denote the resulting ranks as r1, ..., rM . For instance, when M = 3,
|y11 − y12| = 0.3, |y21 − y22| = 0.09, |y31 − y32| = 0.21 =⇒ r1 = 3, r2 = 1, r3 = 2.
The signed-rank test statistic is the sum of the ranks for the pairs where the treated subject
has the higher response:
R ≡M∑
m=1
rm1[sm(ym1 − ym2) > 0] =M∑
m=1
rm1[sm = 1].
Thus, instead of actual numerical differences, the ranks of the differences are used in con-
structing the signed-rank R statistic, rendering it more robust to outliers than the estimator
D.
The inference based on the signed-rank R statistic can similarly follow the permutation
inferential procedure. The permutated version R′ for R is
R′ ≡M∑
m=1
rm1[wmsm(ym1 − ym2) > 0] =M∑
m=1
rm1[wmsm > 0]
=M∑
m=1
rm(1[wm = 1, sm = 1] + 1[wm = −1, sm = −1]).
Observe
E(1[wm = 1, sm = 1] + 1[wm = −1, sm = −1]) =1[sm = 1]
2+
1[sm = −1]2
=12.
V (1[wm = 1, sm = 1] + 1[wm = −1, sm = −1]) =14.
16
Hence, because rm’s are fixed, under the H0,
E(R′) =M∑
m=1
rm12
=12
M∑
m=1
rm =M(M + 1)
4,
V (R′) =M∑
m=1
r2m
14
=14
M∑
m=1
r2m =
M(M + 1)(2M + 1)24
.
When M is large, the null distribution of {R′ − E(R′)}/SD(R′) can be approximated by
N(0, 1). The normally approximated p-value for R is
P{N(0, 1) >R − M(M + 1)/4
{M(M + 1)(2M + 1)/24}1/2}.
This test is distribution-free as (x′, y, d) does not appear in the mean or variance of the null
distribution.
The CI’s for the treatment effect can be similarly obtained by inverting the test. Conduct
level-α tests with different values of β using
Rβ − M(M + 1)/4{M(M + 1)(2M + 1)/24}1/2
, where Rβ ≡M∑
m=1
rmβ1[sm(ym1 − smβ − ym2) > 0]
and rmβ is the rank of |ym1 − smβ − ym2|, m = 1, ..., M .
In contrast to the confidence interval obtained based on Dβ with its point effect estimator
D, there is no point effect estimator available in the signed-rank test. As a point estimator,
one may take the Hodges and Lehmann (1963) estimator, which is obtained by solving for β
such that
Rβ =M(M + 1)
4{= E(R′)}.
That is, after transforming the data with the effect estimate β, one obtains a transformed
signed-rank statistic Rβ that coincides with the mean of the statistic under the null.
3.4 Sensitivity Analysis with Signed-Rank Test
So far the unobserved differences between two subjects in a matched pair have been ignored.
Rosenbaum (2002) presents a sensitivity analysis to account for an unobserved confounder
ε that might affect d. Since our data set is observational where countries self-select the
treatment, there may be unobserved differences across the treatment and control groups,
17
causing a hidden bias (or selection bias). Besides, not all country-couples are observed at all
years; this is a missing variable problem, which is also a selection problem. Hence it matters
to allow for selection problems.
When two subjects in a given pair differ in ε that influences d, their relative probabilities
of receiving the treatment are no longer the same. Rosenbaum (2002) assumes
1Γ≤ odds of one subject being treated
odds of the other being treated≤ Γ, for some constant Γ ≥ 1 ∀ pair.
For instance, if the first subject’s probability of receiving the treatment is 0.6 (0.6) and the
second subject’s probability is 0.5 (0.4), then the odds ratio is
0.6/0.40.5/0.5
= 1.5 (0.6/0.40.4/0.6
= 2.25).
The sensitivity analysis goes as follows. Initially, one proceeds under the assumption of no
unobserved difference (Γ = 1). Then Γ is increased from 1 to see how the initial conclusion is
affected. If it takes a large value of Γ to reverse the initial finding, i.e., if only a strong presence
of ε can overturn the initial conclusion, then the initial conclusion is deemed insensitive to ε.
Otherwise, if it takes only a small value of Γ, then the initial finding is deemed sensitive.
The question is then how “large” is large for Γ. Suppose Γ = 2.25 and imagine that ε
were observed. By observing ε, one would be able to tell who is more likely to be treated
between the two subjects and ask whether the probability difference results in as much as an
odds ratio of 2.25? If the answer is no, Γ = 2.25 is a large value. Thus how large is large
for Γ depends on what is included in x. If most relevant variables are in x and thus if it is
hard to think of any important omitted variable in ε, then even a small value of Γ may be
regarded as large.
An easy-to-implement sensitivity analysis is available for the signed-rank test. Define
p+ ≡ Γ1 + Γ
≥ 0.5 and p− ≡ 11 + Γ
≤ 0.5.
Further define R+ (R−) as the sum of M -many independent random variables where the mth
variable takes rm with probability p+ (p−) and 0 with probability 1− p+ (1− p−). Writing
18
R+ as∑M
m=1 rmum, where P (um = 1) = p+ and P (um = 0) = 1− p+, we get
E(R+) =M∑
m=1
rmE(um) = p+M∑
m=1
rm =p+M(M + 1)
2
V (R+) =M∑
m=1
r2mV (um) = p+(1− p+)
M∑
m=1
r2m =
p+(1− p+)M(M + 1)(2M + 1)6
.
Doing analogously, we obtain
E(R−) =p−M(M + 1)
2and V (R−) =
p−(1− p−)M(M + 1)(2M + 1)6
.
The means and variances with p+ and p− include E(R′) and V (R′) as a special case when
p+ = p− = 1/2.
Suppose that the H0-rejection interval is in the upper tail. Rosenbaum (2002, p.111)
shows that
P (R+ ≥ a) ≥ P (R′ ≥ a) ≥ P (R− ≥ a).
Using these bounds, the p-value obtained under no hidden bias can be bounded by P (R+ ≥ a)
in case of rejection.4 Specifically, suppose that the H0-rejection interval is in the upper tail,
and the no-hidden-bias p-value is
P{N(0, 1) ≥ R− E(R′)SD(R′)
} = 0.001,
leading to the rejection of H0 at level α > 0.001. Rewrite P (R+ ≥ a) ≥ P (R′ ≥ a) as
P{R+ −E(R+)SD(R+)
≥ a− E(R+)SD(R+)
} ≥ P{R′ −E(R′)SD(R′)
≥ a− E(R′)SD(R′)
}
' P{N(0, 1) ≥ a− E(R+)SD(R+)
} ≥ P{N(0, 1) ≥ a− E(R′)SD(R′)
}.
Replacing a in the above equation with the realized R gives the p-value of R on the right
hand side and its bound on the left hand side. The left-hand side can be obtained for different4In case of acceptance, P (R− ≥ a) shows the possibility of rejection when ε is taken into account. For a
two-sided test, the p-value gets multiplied by 2. For a lower-tail test, subtracting the last display from 1 toget
1− P (R+ ≥ a) ≤ 1− P (R′ ≥ a) ≤ 1− P (R− ≥ a) =⇒ P (R+ < a) ≤ P (R′ < a) ≤ P (R− < a),
which can be used for bounds.
19
values of Γ. Then find, at which values of Γ, the upper bound crosses the level α. If this
happens at, say Γ = 2, then check whether or not Γ = 2 is a large value. If Γ = 2 is deemed
large, then the initial rejection is insensitive to the unobserved difference.
The relevant distribution (R+ or R−) to use for calculating the bound in the sensitivity
analysis indicates the possible direction of selection bias that could undermine an initial
significant finding of treatment effect. If the finding is a significantly positive effect, then
we only need to worry about the ‘positive’ selection problem, where a subject with a higher
potential treatment effect is also more likely to be treated; thus, the relevant distribution
is R+ that embodies selection bias in this direction. On the other hand, if the finding is a
significantly negative effect, then a reverse ‘negative’ selection problem, where a subject with
a lower potential treatment effect is also more likely to be treated, can weaken the original
finding, so the sensitivity analysis in the direction of R− is applicable.
4. DATA
Since our focus is on improving the estimation methodology in Rose (2004), we use the
original data set of Rose (2004) without further modifications or amendments.5 Rose (2004)
provides a detailed account of the construction of the data set.
The response variable yijt in our matching framework corresponds to the regressand in
the Rose’s (2004) gravity equation, which measures (the natural logarithm of) the average
value of real bilateral trade between countries i and j in year t. The treatment dummy
variable dijt corresponds to one of the three binary variables (Bothinijt, Oneinijt, GSPijt).
They measure, respectively, whether both countries i and j are GATT/WTO members in
year t, whether only one of the two countries (i, j) is a GATT/WTO member in year t, and
whether country i is a GSP beneficiary of country j or vice versa in year t.
The conditioning variables or covariates xijt in our matching framework correspond to the
complete list of regressors in Rose’s (2004) benchmark gravity equation. This includes: (1)
(the natural logarithm of) the distance between countries i and j; (2) (the natural logarithm
of) the product of real GDP’s of the country couple (i, j) in year t; (3) (the natural logarithm
of) the product of real per capita GDP’s of the country couple (i, j) in year t; (4) a binary
variable which indicates whether the country couple (i, j) share a common language; (5) a5The data set is available from Rose’s Web site (http://faculty.haas.berkeley.edu/arose/GATTdataStata.zip).
20
binary variable which indicates whether the country couple (i, j) share a land border; (6) a
discrete variable which counts the number of landlocked countries in the country couple (i, j);
(7) a discrete variable which counts the number of island nations in the country couple (i, j);
(8) (the natural logarithm of) the product of land areas of the country couple (i, j); (9) a
binary variable which indicates whether the country couple (i, j) were ever colonies after 1945
with the same colonizer; (10) a binary variable which indicates whether country i is a colony
of country j in year t or vice versa; (11) a binary variable which indicates whether country
i ever colonized country j or vice versa; (12) a binary variable which indicates whether the
country couple (i, j) remained part of the same nation during the sample; (13) a binary
variable which indicates whether the country couple (i, j) use the same currency in year t;
(14) a binary variable which indicates whether the country couple (i, j) belong to the same
regional trade agreement, (15) a list of year dummies for t = 1948, . . . , 1999; and (16) two
of the three binary variables (Bothin, Onein, GSP ), with the one whose treatment effect is
being investigated suitably excluded from the list.
The complete sample contains the above variables observed for 178 IMF trading entities
between 1948 and 1999 (with gaps). This amounts to a total of 234,597 observations.
5. ESTIMATION AND RESULTS
5.1 Matching Criteria and Procedures
We begin with the baseline methodology described in Section 3. This approach, by matching
in terms of the conditioning variables x, controls for the difference in bilateral trade arising
from differences in observable characteristics across country-couples and years (such as geo-
graphical proximity and output levels). Although the set of conditioning variables x used for
matching is quite extensive, it is possible that some systematic differences in unobservable
characteristics ε remain which lead to selection bias. The sensitivity analysis introduced in
Section 3.4 helps assess the sensitivity of estimation results to whatever selection bias may
remain; it, however, does not address the source of selection bias. We experiment with dif-
ferent matching criteria to address various potential selection biases. They are: matching
restricted to the observations of the same country-couple, matching restricted to the obser-
vations in the same year, matching restricted to the observations in the same GATT/WTO
period, and matching restricted to the observations of the same income-class combination
21
(details of the above criteria to be elaborated later). By restricting the potential match to
the observations of the specified criterion, we are accounting for the possibility that system-
atic unobservable differences exist (across country-couples, across years, across GATT/WTO
periods, or across income-class combinations) which influence bilateral trade patterns and
decisions to join the GATT/WTO or decisions to extend GSP. Tables 3–7 report the results
for each set of matching criterion.
For each set of matching criterion, we estimate the treatment effect of: ‘Both in GATT/WTO’,
‘One in GATT/WTO’, and ‘GSP’, respectively. For the GATT/WTO effect, both the effect
on the treated (i.e., those who chose to join the GATT/WTO) as well as the effect on the
untreated (i.e., those who chose not to join) are of interest. We report both effects and
the weighted average of them as the representative effect for all. For the GSP effect, we
report only the effect on the treated. We do not report the effect on the untreated, as GSP’s
are unilateral trade preferences extended only from the rich to the developing country. It
does not make much sense to propose a GSP arrangement between two rich countries, for
example, and to investigate the potential trade effect. On the other hand, the GSP effect
on the treated that we will report below should also be taken with a grain of salt, except
in the last matching exercise where matching is restricted to the observations of the same
income-class combination. This is because the decision to extend GSP’s and the bilateral
trade pattern may be both dependent on the relative income level of trading partners, which
poses potential selection bias problems. The last matching exercise, with matching restricted
to the observations of the same income-class combination, is least subject to this caveat.
When the treatment effect of ‘Both in GATT/WTO’ is investigated, observations with
only one country in the GATT/WTO are dropped. This leaves a remaining sample with
only observations where the two countries are either both in the GATT/WTO (114,750
observations) or both are outside the GATT/WTO (21,037 observations). Similarly, when the
treatment effect of ‘One in GATT/WTO’ is investigated, observations with both countries in
the GATT/WTO are dropped. The remaining sample thus includes only observations where
either only one of the two countries is in the GATT/WTO (98,810) or both are outside the
GATT/WTO (21,037). For the treatment effect of ‘GSP’, a small proportion (54,285) of the
whole sample (234,597) has GSP arrangements.
In all matching exercises, we restrict our attention to the case of pair-matching, in which
only one matched unit is selected. Since our conditioning variables x contain continuous
22
variables, the likelihood of two matched units having equal distance in terms of x from the
target is negligible and the case of multiple-matching can be safely ignored. In order to
measure the distance ‖xijt − xi′j′t′‖, we use the simple scale-normalized distance.
In Section 3.2, we introduced both simulation and normal approximation approaches to
obtaining the p-value of the treatment effect estimator D based on the permutation test.
Although not mentioned explicitly in Section 3.3, in addition to the normal approximation
approach, it is also possible to calculate the p-value for the signed-rank R statistic based on
simulated permutation samples. We carried out both approaches and found the normal p-
value to approximate the simulated p-value extremely well (which is expected as the sample
size is large). Thus, we present only the normal approximation result in the reports that
follow.
In any given matching exercise, we experiment with three caliper choices: the caliper is
set such that only 100%, 80%, or 60% of matched pairs are qualified for the estimation of the
treatment effect. For example, for the caliper choice of 60%, matched pairs with a distance
(in terms of x) exceeding the upper 60 percentile of all matched pairs are discarded. The
caliper choice of 100% is equivalent to using all available matched pairs.
5.2 Matching Results
In Tables 3–7, the first column ‘permutation test’ reports the results on permutation tests.
The ‘effect’ sub-column presents the treatment effect estimate based on the D statistic, the p-
value is obtained for the observed D statistic using the permutation test based on the normal
approximation approach, and the CI is obtained by inverting the test procedure. The second
column ‘signed-rank test’ reports the results on signed-rank tests. The ‘effect’ sub-column
presents the treatment effect estimate based on the Hodges and Lehmann (1963) estimator,
the p-value is obtained for the observed R statistic using the signed-rank test, and the CI is
obtained by inverting the test procedure. The third column ‘sensitivity analysis’ reports the
results on sensitivity analysis. The sensitivity analysis is conducted for the signed-rank test
based on a significance level of α = 0.05 in a one-sided or two-sided test. R+ or R− (as a
function of Γ) indicates the relevant distribution on which the bound for the p-value of the
signed-rank test is based. Γ∗ indicates the critical value of Γ at which the conclusion of the
signed-rank test reverses.
We discuss Tables 3–7 now. Table 3 reports the results of the baseline methodology,
23
labeled ‘unrestricted matching’. In this case, the pool of potential matches for an observation
are the observations with the opposite treatment; no further restriction is imposed. The
numbers of matched pairs obtained (M1 for the case of the effect on the treated, and M0 for
the untreated) are specified in the parentheses. We note that the permutation test and the
signed-rank test produce similar results. As the signed-rank test is considered more robust to
outliers and is also directly related to the sensitivity analysis, we cite the signed-rank test’s
figures below for illustrative purpose.
First, we note that the treatment effect of ‘Both in GATT/WTO’ on the treated is
large and significant. For example, with the 80% caliper, the point estimate indicates that
the GATT/WTO membership raises bilateral trade volume by 193% (= e1.075 − 1) if both
trading partners are GATT/WTO members. The corresponding effect is smaller at 76%
(= e0.568 − 1) if only one joins the GATT/WTO. Nevertheless, the effect is still positive and
significant. Thus, our estimation results suggest that the trade-creating effect of GATT/WTO
membership does not come at the cost of diverting trade from nonmembers. Relative to the
GATT/WTO membership, the preferential GSP scheme also promotes bilateral trade, by
a factor of 101% (= e0.696 − 1) with the 80% caliper, for trading relationships that apply
the scheme. It is important to note that the positive and stronger trade effect of ‘Both in
GATT/WTO’ is shared by a larger number of bilateral trading relationships (114, 750), than
the GSP scheme (54, 285). Thus, either on the average or in the aggregate, our estimation
results suggest that the realized trade-creating effect of GATT/WTO membership exceeds
that of GSP by a great extent.
How about the potential trade effect of GATT/WTO for trading relationships that are
outside the system? The treatment effect estimates on the untreated (with the 80% caliper)
suggest that the bilateral trade volume would have increased by 22% (= e0.200 − 1) if both
trading partners were to join the GATT/WTO or 9% (= e0.089−1) if only one of them did so.
The effects are smaller compared to the effect on the treated but they are still significantly
positive. As discussed in Section 3.1, the required assumption for applying the matching
estimator to the treated, y0 q d|x, is more likely to hold than to the untreated, y1 q d|x.
Thus, we will place less emphasis on the effect on the untreated in the following discussions,
although their results will continue to be reported in the tables.
Table 4 reports the results for ‘matching within country-couple’. In this case, the pool
of potential matches for an observation are restricted to the observations with the oppo-
24
site treatment and of the same country-couple. Some country-couples may be both in
the GATT/WTO, be both outside the GATT/WTO, or have only one of them in the
GATT/WTO, throughout the sampling years (1948 to 1999). Alternatively, some country-
couples may have only one of them in the GATT/WTO for some period of the sampling years
and then be both in the GATT/WTO throughout the rest of the sampling years. In these
cases, these observations do not have qualified matched controls (or matched treatment),
and are discarded from the sample. This explains the much smaller sample size shown in
Table 4. The estimated treatment effects on the treated for the three types of treatments
have the same ranking as in the case of unrestricted matching: Bothin effect > GSP effect
> Onein effect (with the 80% caliper). While the current estimates suggest overall smaller
treatment effects on the treated, the trade-creating effect if both trading partners are in the
GATT/WTO continues to be economically (and statistically) significant: for example, based
on the 80% caliper, bilateral trade increases by as much as 126% (= e0.814 − 1).
In Table 5, labeled ‘matching within year’, the pool of potential matches for an observation
are restricted to the observations with the opposite treatment and in the same year. In this
case, the year dummies in x are redundant and so are dropped from x. The estimation results
differ slightly from those of the ‘unrestricted matching’ in terms of the permutation test, but
they come across as almost the same as those in Table 3 in terms of the signed-rank test. This
indicates that in the case of ‘unrestricted matching’, the matched pairs often occur among
cross-section observations of the same year, and thus the estimates pick up mostly the cross-
section variations, whereas the estimates derived from ‘matching within country-couple’ as
discussed in the previous paragraph measure only time-series variations. Both cross-section
and time-series variations indicate that there are significant gains in trade by joining the
GATT/WTO.
Table 6 presents the results for ‘matching within period’. The pool of potential matches for
an observation are restricted to the observations with the opposite treatment and in the same
period, where the periods correspond to different eras of the GATT/WTO history. They are:
1948 (Before Annecy round), 1949-1951 (Annecy to Torquay round), 1952-1956 (Torquay to
Geneva round), 1957-1961 (Geneva to Dillon round), 1962-1967 (Dillon to Kennedy round),
1968-1979 (Kennedy to Tokyo round), 1980-1994 (Tokyo to Uruguay round), 1995-(After
Uruguay round). Given our earlier observations that in the case of ‘unrestricted matching’,
the matched pairs often occur among cross-section observations of the same year, it is no
25
surprise to see that the current estimation results again appear to be almost the same as in
the case of ‘unrestricted matching’, as the criterion of matching within the same period in
effect does not impose much extra restriction.
Table 7 reports the final set of results, for ‘matching within income-class combination’.
In this matching exercise, the pool of potential matches for an observation are restricted to
the observations with the opposite treatment and of the same income-class combination. The
income-class combinations are: ‘low income-low income’ country-couples, ‘low income-middle
income’ country couples, ‘low income-high income’ country couples, ‘middle income-middle
income’ country couples, ‘middle income-high income’ country couples, and ‘high income-
high income’ country couples. Observations without a qualified match are discarded. The
estimated treatment effects on the treated are overall smaller for GATT/WTO membership as
well as for GSP, although their relative ranking remains the same: Bothin effect > GSP effect
> Onein effect (with the 80% caliper), as with other matching criteria. The current estimates
suggest that the GATT/WTO membership promotes bilateral trade, by an economically and
statistically significant factor of: 108% (= e0.734 − 1) if both trading partners are in the
GATT/WTO and 58% (= e0.460−1) if only one of them is in the GATT/WTO. As discussed
earlier, the estimated GSP treatment effect with matching within the same income-class
combination is likely to be less subject to the selection bias problem. Based on this setup,
the GSP treatment effect is estimated to raise bilateral trade by a factor of 73% (= e0.551−1).
5.3 Sensitivity Analysis Results
How robust are the results we have referred to so far? By the sensitivity analysis, Table 3
indicates that in the case of ‘unrestricted matching’, the significant treatment effects of ‘Both
in GATT/WTO’ and ‘GSP’ are robust to any positive selection bias that leads to as much
as an odds ratio of 2.081, and 2.117 respectively, in receiving treatment between the treated
and untreated subject in any matched pair (by the 80% caliper and the two-sided test). On
the other hand, the robustness of the significant treatment effect of ‘One in GATT/WTO’
to potential positive selection bias is relatively weaker (measured by Γ∗ at 1.521). The same
degrees of robustness of the three treatments reappear in Tables 5 and 6, when the matching
is restricted to the observations in the same year, or in the same period.
Are these figures large enough? In using similar sensitivity analysis, Aakvik (2001) seems
to regard Γ = 1.5 ∼ 2 as large, while Hujer et al. (2004) appear to base their discussions
26
on lower numbers of Γ = 1.25 ∼ 1.5. Given that the list of conditioning variables x used
in our framework is extensive and includes most of the important variables likely to affect
bilateral trade flows, we feel that Γ∗ > 2 is sufficiently large. In other words, we judge that
any important omitted variables in ε are not likely to result in as much as an odds ratio of 2
in receiving treatment between the treated and untreated subject in any matched pair. Thus,
we may accept these estimated treatment effects of ‘Both in GATT/WTO’ and ‘GSP’ with
a reasonable degree of confidence.
When the matching criterion becomes more stringent such that further imaginable sources
of selection bias are minimized, one may accept an even lower magnitude of Γ∗, as the
remaining possibility of selection bias is lower. Both the ‘matching within country-couple’
and the ‘matching within income-class combination’ impose effective extra constraints relative
to the ‘unrestricted matching’ benchmark, as discussed earlier and to some extent indicated
by their fewer effective matched pairs. In the latter case of ‘matching within income-class
combination’, the robustness of the treatment effect estimate is lower in general than the
benchmark. The tolerance threshold (Γ∗) for positive selection bias now stands at 1.601,
1.807, and 1.391 (by the 80% caliper and the two-sided test) for the estimated treatment
effect of ‘Both in GATT/WTO’, ‘GSP’, and ‘One in GATT/WTO’, respectively. It will take
further investigations to make a fine judgement of whether the much lower thresholds of 1.601
and 1.391 are acceptable given the stricter criterion of matching within the same income-class
combination. However, the treatment effect estimate of GSP may be regarded as reasonably
robust with a tolerance level of Γ∗ = 1.807 in the current stringent matching setup.
On the other hand, with the more stringent criterion of ‘matching within country-couple’,
the robustness of the estimated treatment effects actually strengthens (as illustrated in Ta-
ble 4). It now takes a strong presence of positive selection bias that leads to as much as
an odds ratio of 2.543 (by the 80% caliper and the two-sided test), in receiving treatment
between the treated and untreated subject in any matched pair, to nullify the original signifi-
cance finding of ‘Both in GATT/WTO’ treatment effect. The corresponding figures are 2.494
and 1.772 for the ‘GSP’ and ‘One in GATT/WTO’ treatment, respectively. Thus, relative
to the ‘unrestricted matching’ benchmark, it is even more comfortable for us to accept the
results on the Bothin and the GSP treatment effects, as the degree of tolerance level for
selection bias is higher despite that the possibility of remaining selection bias is lower with
the extra matching criterion.
27
5.4 Overall Results
Overall, we conclude with two sets of relatively robust estimates for the trade effect of
GATT/WTO membership, represented by the Bothin treatment effect obtained by ‘un-
restricted matching’ and by ‘matching within country-couple’. As discussed earlier, the re-
sults of ‘unrestricted matching’ are almost identical to those of ‘matching within year’ and
‘matching within period’ and capture most likely the cross-sectional treatment effect. Our
intermediate estimate (by the 80% caliper) suggests that the GATT/WTO membership in-
creases bilateral trade volume by 193% (= e1.075 − 1) for trading partners that are both
GATT/WTO members, relative to trading partners that are both outside the GATT/WTO.
On the other hand, the results of ‘matching within country-couple’ capture the time-series
variation of treatment effect. In this case, our intermediate estimate (by the 80% caliper)
suggests that the GATT/WTO membership increases bilateral trade volume by a smaller
factor of 126% (= e0.814− 1) when trading partners both join GATT/WTO, relative to when
they both do not.
Regarding GSP, we are more comfortable with the results obtained based on ‘matching
within income-class combination’, as the decision to extend GSP is very likely to be dependent
on trading partners’ relative income levels. The intermediate estimate (by the 80% caliper)
suggests that the GSP scheme raises bilateral trade volume by a factor of 73% (= e0.551− 1).
Note again that the GATT/WTO effect has in the past applied to a much larger number
of trading relationships (114,750) than the GSP effect (54,285). Thus, either on the average
or in the aggregate, the realized trade promoting effect of GATT/WTO far exceeds that of
GSP.
6. CONCLUSION
As the multilateral trade institution evolved from the simple trade treaty, the GATT, to the
overarching organization, the WTO, it is timely to review whether the institution lives up
to its objective of promoting international trade. In spite of the common affirmative im-
pression, Rose (2004) argues that little evidence is found that the GATT/WTO has had an
impact on trade. In this paper, we readdress this important policy question, arguing against
the parametric gravity-equation approach used by Rose (2004), and propose nonparametric
approaches to estimating the GATT/WTO trade effect. This involves using matching meth-
28
ods to estimate the ‘treatment effect’ of GATT/WTO membership and permutation-based
inferential procedures to assess statistical significance of the estimated effects. The problem
of potential selection bias is addressed by the Rosenbaum (2002) sensitivity analysis and by
restricting matching within sub-samples.
Contrary to Rose (2004), we find consistent and large positive effects on trade when
countries participate in the GATT/WTO system. The effect is larger when two countries in
a bilateral trade relationship are both members than when only one is a member, but the
effect is nonetheless significantly positive even when only one is a member, confirming an
overall trade-creating effect of the multilateral institution. The positive effect of multilateral
participation in the GATT/WTO system also exceeds that of the preferential GSP scheme by
a great extent, either on the average or in the aggregate, contrasting Rose’s (2004) result that
GSP is more effective in promoting trade than the GATT/WTO. Overall, our finding confirms
the common sense that preferential liberalizations (via GSP or unilateral participation in the
GATT/WTO) may be good, but multilateral liberalizations (via reciprocal participation in
the GATT/WTO) are most effective at promoting trade.
We conclude with some qualifications of our findings. First, the data set of Rose (2004)
that we use includes only observations with positive bilateral trade flows. A recent study by
Felbermayr and Kohler (2007) suggests that by taking into account trading relationships with
zero trade leads to a bigger trade effect of GATT/WTO membership than indicated by Rose
(2004). This finding does not undermine our foregoing conclusions, if we accept the notion
proposed in Felbermayr and Kohler (2007) that non-existent trading relationships are also
more likely those of non-participants in the GATT/WTO. It implies that if we already find
significantly positive trade effect of GATT/WTO conditional on observations with positive
trade flows, then the ultimate trade effect of GATT/WTO incorporating the extensive margin
would only be larger. Second, using the data set of Rose (2004), which compiles the average
of export and import flows, implies that we can not differentiate the direction of trade flows
and thus control for exporter or importer specific effects. However, just as in Rose (2004),
who controls for country-couple or dyad specific effect in some analysis with fixed-effect
estimator, we accomplish the same by restricting matching to between observations of the
same country-couple in one of our analysis. It may be helpful to further verify our findings
by using directional trade data. We leave this for future research.
29
REFERENCES
Aakvik, A., 2001. Bounding a matching estimator: the case of a Norwegian training program.
Oxford Bulletin of Economics and Statistics 63, 115–143.
Abadie, A., Imbens, G. W., 2006. Large sample properties of matching estimators for average
treatment effects. Econometrica 74 (1), 235–267.
Altonji, J. G., Elder, T. E., Taber, C. R., 2005. Selection on observed and unobserved vari-
ables: assessing the effectiveness of Catholic schools. Journal of Political Economy 113,
151–184.
Anderson, J. E., 1979. A theoretical foundation for the gravity equation. American Economic
Review 69 (1), 106–16.
Anderson, J. E., van Wincoop, E., 2003. Gravity with gravitas: A solution to the border
puzzle. American Economic Review 93 (1), 170–92.
Bagwell, K., Staiger, R. W., 1999. An economic theory of GATT. American Economic Review
89 (1), 215–248.
Bagwell, K., Staiger, R. W., 2001. Reciprocity, non-discrimination and preferential agree-
ments in the multilateral trading system. European Journal of Political Economy 17, 281–
325.
Deardorff, A. V., 1998. Determinants of bilateral trade: Does gravity work in a neoclassical
world? In: Frankel, J. A. (Ed.), The Regionalization of the World Economy. University of
Chicago Press, Chicago, pp. 7–22.
Ernst, M. D., 2004. Permutation methods: A basis for exact inference. Statistical Science
19 (4), 676–685.
Evenett, S. J., Braga, C. A. P., 2005. WTO accession: Lessons from experience. World Bank
Trade Note 22.
Felbermayr, G., Kohler, W., 2007. Does WTO membership make a difference at the extensive
margin of world trade? CESifo Working Paper No. 1898.
30
Good, P. I., 2004. Permutation, Parametric, and Bootstrap Tests of Hypotheses, 3rd Edition.
Springer Series in Statistics. Springer.
Heckman, J., Ichimura, H., Smith, J., Todd, P., 1998. Characterizing selection bias using
experimental data. Econometrica 66 (5), 1017–1098.
Heckman, J. J., Ichimura, H., Todd, P. E., 1997. Matching as an econometric evaluation
estimator: Evidence from evaluating a job training programme. The Review of Economic
Studies 64 (4), 605–654.
Helpman, E., Krugman, P., 1985. Market Structure and Foreign Trade: Increasing Returns,
Imperfect Competition, and the International Economy. The MIT Press, Cambridge, MA.
Hodges, J., Lehmann, E., 1963. Estimates of location based on rank tests. Annals of Mathe-
matical Statistics 34, 598–611.
Hollander, M., Wolfe, D. A., 1999. Nonparametric statistical methods, 2nd Edition. Wiley.
Hujer, R., Caliendo, M., Thomsen, S. L., 2004. New evidence on the effects of job creation
schemes in Germany – a matching approach with threefold heterogeneity. Research in
Economics 58, 257–302.
Imbens, G. W., 2003. Sensitivity to exogeneity assumptions in program evaluation. American
Economic Review (Papers and Proceedings) 93, 126–132.
Imbens, G. W., 2004. Nonparametric estimation of average treatment effects under exogene-
ity: A review. The Review of Economics and Statistics 86 (1), 4–29.
Imbens, G. W., Rosenbaum, P. R., 2005. Robust, accurate confidence intervals with a weak
instrument: quarter of birth and education. Journal of the Royal Statistical Society (Series
A) 168, 109–126.
Johnson, H. G., 1953–1954. Optimum tariffs and retaliation. Review of Economic Studies
21 (2), 142–153.
Lechner, M., 2000. An evaluation of public-sector-sponsored continuous vocational training
programs in east germany. The Journal of Human Resources 35 (2), 347–375.
31
Lee, M.-J., 2005. Micro-econometrics for policy, program, and treatment effects. Oxford Uni-
versity Press.
Lee, M.-J., Hakkinen, U., Rosenqvist, G., 2007. Finding the best treatment under heavy
censoring and hidden bias. Journal of the Royal Statistical Society (Series A) 170, 133–
147.
Lehmann, E. L., Romano, J. P., 2005. Testing statistical hypotheses. Springer.
Lu, B., Zanutto, E., Hornik, R., Rosenbaum, P. R., 2001. Matching with doses in an observa-
tional study of a media campaign against drug abuse. Journal of the American Statistical
Association 96 (456), 1245–1253.
Martiz, J., 1995. Distribution-free statistical methods, 2nd Edition. Chapman&Hall.
Oguledo, V. I., MacPhee, C. R., 1994. Gravity models: A reformulation and an application
to discriminatory trade arrangements. Applied Economics 26 (2), 107–120.
Pesarin, F., 2001. Multivariate Permutation Tests: With Applications in Biostatistics. Wiley.
Ramsey, J. B., 1969. Tests for specification errors in classical linear least-squares regression
analysis. Journal of the Royal Statistical Society (Series B) 31 (2), 350–371.
Rose, A. K., 2004. Do we really know that the WTO increases trade? American Economic
Review 94, 98–114.
Rosenbaum, P. R., 2002. Observational studies, 2nd Edition. Springer.
Staiger, R. W., Tabellini, G., 1987. Discretionary trade policy and excessive protection.
American Economic Review 77 (5), 823–837.
Staiger, R. W., Tabellini, G., 1989. Rules and discretion in trade policy. European Economic
Review 33 (6), 1265–1277.
Staiger, R. W., Tabellini, G., 1999. Do GATT rules help governments make domestic com-
mitments? Economics & Politics 11 (2), 109–144.
Subramanian, A., Wei, S.-J., 2007. The WTO promotes trade, strongly but unevenly. Journal
of International Economics 72 (1), 151–175.
32
Tomz, M., Goldstein, J., Rivers, D., 2005. Membership has its privileges: The impact of
GATT on international trade. Stanford Center For International Development Working
Paper No. 250.
Wilcoxon, F., 1945. Individual comparisons by ranking methods. Biometrics 1, 80–83.
WTO, 2006. International Trade Statistics. World Trade Organization, Geneva.
33
Table 1: misspecification test of the gravity equation(I) Rose (II) RESET (III) with interaction terms
Both in GATT/WTO -0.04 (0.05) -0.04 (0.05) -10.72 (1.10)One in GATT/WTO -0.06 (0.05) -0.07 (0.05) -7.61 (1.08)GSP 0.86 (0.03) 0.99 (0.04) 0.56 (0.26)Log distance -1.12 (0.02) -1.33 (0.05) -1.10 (0.06)Log product real GDP 0.92 (0.01) 1.08 (0.03) 0.86 (0.03)Log product real GDP p/c 0.32 (0.01) 0.38 (0.02) 0.05 (0.04)Regional FTA 1.20 (0.11) 1.55 (0.11) 0.58 (0.39)Currency union 1.12 (0.12) 1.30 (0.13) 0.04 (0.32)Common Language 0.31 (0.04) 0.35 (0.04) 0.09 (0.11)Land border 0.53 (0.11) 0.67 (0.11) 0.56 (0.19)Number landlocked -0.27 (0.03) -0.31 (0.03) -0.17 (0.09)Number islands 0.04 (0.04) 0.06 (0.04) 0.11 (0.12)Log product land area -0.10 (0.01) -0.11 (0.01) -0.17 (0.02)Common colonizer 0.58 (0.07) 0.69 (0.07) 1.08 (0.16)Currently colonized 1.08 (0.23) 1.28 (0.23) 4.81 (0.57)Ever colony 1.16 (0.12) 1.45 (0.13) -0.53 (0.21)Common country -0.02 (1.08) -0.08 (1.07) 0.05 (1.03)
( Log real trade)2 -0.01 (0.002)Both in x GSP 0.11 (0.26)Both in x Log distance -0.03 (0.07)Both in x Log product real GDP 0.07 (0.03)Both in x Log product real GDP p/c 0.33 (0.05)Both in x Regional FTA 0.27 (0.41)Both in x Currency union 1.37 (0.35)Both in x Common Language 0.30 (0.12)Both in x Land border 0.02 (0.25)Both in x Number landlocked -0.13 (0.09)Both in x Number islands -0.10 (0.12)Both in x Log product land area 0.10 (0.02)Both in x Common colonizer -0.61 (0.18)Both in x Currently colonized -3.91 (0.62)Both in x Ever colony 1.69 (0.25)Both in x Common country (dropped)One in x GSP 0.48 (0.26)One in x Log distance -0.01 (0.06)One in x Log product real GDP 0.05 (0.03)One in x Log product real GDP p/c 0.25 (0.05)One in x Regional FTA 1.17 (0.41)One in x Currency union 0.67 (0.37)One in x Common Language 0.27 (0.12)One in x Land border -0.08 (0.23)One in x Number landlocked -0.10 (0.09)One in x Number islands -0.10 (0.12)One in x Log product land area 0.06 (0.02)One in x Common colonizer -0.58 (0.17)One in x Currently colonized (dropped)One in x Ever colony 1.71 (0.24)One in x Common country (dropped)
Observations 234,597 234,597 234,597R2 0.6480 0.6486 0.6525RMSE 1.9796 1.9777 1.9669
Note:Regressand: log real trade. OLS with year effects (intercepts not reported). Robust standard errors (clustering bycountry-couples) are in parentheses.
34
Table 2: trade effect based on gravity equation with interaction termsBoth in Effect One in Effect
PercentilesSmallest -3.2530 -2.551725% -0.2579 -0.209650% 0.2552 0.209375% 0.7786 0.7118Largest 4.8556 4.8237
Mean 0.2724 0.2723Standard Error of Mean 0.0017 0.0015Observations 234,597 234,597
Standard Deviation 0.8428 0.7442Variance 0.7103 0.5538Skewness 0.3219 0.6496Kurtosis 3.9576 4.4601
Note:Calculation based on regression (III) of Table 1. For tradingpartners i and j in year t,(i) {Both in effect}ijt = β{Both in} + β{Both in x GSP}{GSP}ijt + . . . + β{Both in x Ever colony} {Ever colony}ijt;(ii) {One in effect}ijt = β{One in} + β{One in x GSP} {GSP}ijt
+ . . . + β{One in x Common colonizer} {Common colonizer}ijt +β{One in x Ever colony} {Ever colony}ijt.
35
Table 3: treatment effect – unrestricted matchingpermutation test signed-rank test sensitivity analysis
one-sided test two-sided testcaliper effect p-value 95% CI effect p-value 95% CI Γ∗ as in Γ∗ as in
Both in GATT/WTO treatment effecton the treated (M1 = 114, 750):100% 1.328 0.000 [1.307, 1.349] 1.332 0.000 [1.312, 1.351] 2.434 R+ 2.428 R+
80% 1.075 0.000 [1.052, 1.098] 1.075 0.000 [1.053, 1.096] 2.086 R+ 2.081 R+
60% 0.836 0.000 [0.810, 0.862] 0.835 0.000 [0.810, 0.859] 1.780 R+ 1.775 R+
on the untreated (M0 = 21, 037):100% 0.337 0.000 [0.296, 0.379] 0.303 0.000 [0.266, 0.342] 1.250 R+ 1.243 R+
80% 0.239 0.000 [0.192, 0.286] 0.200 0.000 [0.157, 0.241] 1.144 R+ 1.138 R+
60% 0.185 0.000 [0.131, 0.239] 0.138 0.000 [0.090, 0.187] 1.084 R+ 1.077 R+
on all (M1 + M0 = 135, 787):100% 1.175 0.000 [1.156, 1.193] 1.161 0.000 [1.143, 1.179] 2.209 R+ 2.205 R+
80% 0.899 0.000 [0.878, 0.919] 0.883 0.000 [0.863, 0.902] 1.858 R+ 1.854 R+
60% 0.636 0.000 [0.613, 0.659] 0.619 0.000 [0.597, 0.640] 1.559 R+ 1.555 R+
One in GATT/WTO treatment effecton the treated (M1 = 98, 810):100% 0.767 0.000 [0.746, 0.789] 0.773 0.000 [0.753, 0.792] 1.759 R+ 1.755 R+
80% 0.564 0.000 [0.540, 0.588] 0.568 0.000 [0.547, 0.589] 1.525 R+ 1.521 R+
60% 0.422 0.000 [0.396, 0.449] 0.428 0.000 [0.405, 0.451] 1.397 R+ 1.393 R+
on the untreated (M0 = 21, 037):100% 0.030 0.068 [-0.009, 0.069] 0.034 0.022 [0.000, 0.068] 1.006 R+ 1.001 R+
80% 0.092 0.000 [0.048, 0.135] 0.089 0.000 [0.052, 0.126] 1.057 R+ 1.051 R+
60% 0.078 0.001 [0.028, 0.129] 0.084 0.000 [0.041, 0.127] 1.046 R+ 1.039 R+
on all (M1 + M0 = 119, 847):100% 0.638 0.000 [0.619, 0.657] 0.632 0.000 [0.615, 0.649] 1.610 R+ 1.607 R+
80% 0.443 0.000 [0.422, 0.464] 0.437 0.000 [0.418, 0.455] 1.401 R+ 1.397 R+
60% 0.324 0.000 [0.301, 0.347] 0.321 0.000 [0.301, 0.340] 1.297 R+ 1.293 R+
GSP treatment effecton the treated (M1 = 54, 285):100% 0.851 0.000 [0.831, 0.871] 0.792 0.000 [0.774, 0.811] 2.277 R+ 2.269 R+
80% 0.757 0.000 [0.736, 0.778] 0.696 0.000 [0.676, 0.716] 2.125 R+ 2.117 R+
60% 0.693 0.000 [0.668, 0.717] 0.627 0.000 [0.604, 0.649] 1.998 R+ 1.990 R+
Note:1. The pool of potential matches for an observation are observations with the opposite treatment; no further restrictionis imposed. The numbers of matched pairs obtained (M1 for the case of the effect on the treated, and M0 for theuntreated) are specified in the parentheses.2. The caliper choice is set such that only 100%, 80%, or 60% of matched pairs are qualified for the estimation of thetreatment effect. For example, for the case of 60%, matched pairs with a distance in terms of x exceeding the upper 60percentile of all matched pairs are discarded.3. The regressors in the benchmark gravity equation of Rose (2004) are used as the conditioning variables x for matching,with the treatment dummy variable being investigated (Bothin, Onein, or GSP ) suitably excluded from the list.4. In the column ‘permutation test’, the ‘effect’ sub-column presents the treatment effect estimate based on theD statistic; the p-value is obtained for the observed D statistic using the permutation test based on the normalapproximation approach; the CI is obtained by inverting the test procedure as discussed in the main text.5. In the column ‘signed-rank test’, the ‘effect’ sub-column presents the treatment effect estimate based on the Hodgesand Lehmann (1963) estimator; the p-value is obtained for the observed R statistic using the signed-rank test based onthe normal approximation approach; the CI is obtained by inverting the test procedure as discussed in the main text.6. In the column ‘sensitivity analysis’, the sensitivity analysis is conducted for the above signed-rank test based on asignificance level of α = 0.05 in a one-sided or two-sided test. R+ or R− (as a function of Γ) indicates the relevantdistribution on which the bound for the p-value of the signed-rank test is based. Γ∗ indicates the critical value of Γ atwhich the conclusion of the signed-rank test reverses.
36
Table 4: treatment effect – matching within country-couplepermutation test signed-rank test sensitivity analysis
one-sided test two-sided testcaliper effect p-value 95% CI effect p-value 95% CI Γ∗ as in Γ∗ as in
Both in GATT/WTO treatment effecton the treated (M1 = 19, 760):100% 0.941 0.000 [0.912, 0.970] 0.996 0.000 [0.970, 1.023] 3.189 R+ 3.170 R+
80% 0.760 0.000 [0.727, 0.792] 0.814 0.000 [0.785, 0.844] 2.559 R+ 2.543 R+
60% 0.833 0.000 [0.797, 0.870] 0.876 0.000 [0.843, 0.910] 2.792 R+ 2.771 R+
on the untreated (M0 = 9, 510):100% 1.300 0.000 [1.255, 1.345] 1.359 0.000 [1.317, 1.401] 4.168 R+ 4.129 R+
80% 1.117 0.000 [1.067, 1.167] 1.161 0.000 [1.115, 1.207] 3.475 R+ 3.440 R+
60% 0.989 0.000 [0.931, 1.048] 1.019 0.000 [0.967, 1.071] 3.017 R+ 2.983 R+
on all (M1 + M0 = 29, 270):100% 1.058 0.000 [1.033, 1.082] 1.110 0.000 [1.087, 1.132] 3.515 R+ 3.496 R+
80% 0.895 0.000 [0.868, 0.923] 0.943 0.000 [0.918, 0.967] 2.927 R+ 2.911 R+
60% 0.935 0.000 [0.903, 0.967] 0.969 0.000 [0.940, 0.998] 3.021 R+ 3.002 R+
One in GATT/WTO treatment effecton the treated (M1 = 23, 463):100% 0.464 0.000 [0.438, 0.489] 0.532 0.000 [0.510, 0.555] 1.940 R+ 1.931 R+
80% 0.403 0.000 [0.374, 0.432] 0.470 0.000 [0.444, 0.495] 1.782 R+ 1.772 R+
60% 0.371 0.000 [0.335, 0.407] 0.460 0.000 [0.428, 0.491] 1.666 R+ 1.656 R+
on the untreated (M0 = 15, 182):100% 0.579 0.000 [0.544, 0.613] 0.641 0.000 [0.611, 0.671] 2.110 R+ 2.097 R+
80% 0.463 0.000 [0.423, 0.503] 0.525 0.000 [0.492, 0.559] 1.818 R+ 1.805 R+
60% 0.386 0.000 [0.339, 0.432] 0.430 0.000 [0.390, 0.469] 1.610 R+ 1.597 R+
on all (M1 + M0 = 38, 645):100% 0.509 0.000 [0.488, 0.529] 0.574 0.000 [0.556, 0.592] 2.023 R+ 2.016 R+
80% 0.428 0.000 [0.404, 0.452] 0.492 0.000 [0.472, 0.513] 1.816 R+ 1.808 R+
60% 0.403 0.000 [0.374, 0.432] 0.478 0.000 [0.453, 0.503] 1.706 R+ 1.698 R+
GSP treatment effecton the treated (M1 = 52, 025):100% 0.487 0.000 [0.476, 0.499] 0.478 0.000 [0.468, 0.488] 2.579 R+ 2.570 R+
80% 0.492 0.000 [0.479, 0.506] 0.489 0.000 [0.478, 0.501] 2.504 R+ 2.494 R+
60% 0.379 0.000 [0.363, 0.395] 0.371 0.000 [0.357, 0.385] 1.945 R+ 1.937 R+
Note:1. The pool of potential matches for an observation are restricted to observations with the opposite treatment and ofthe same country-couple; observations without a match are discarded. The numbers of matched pairs obtained (M1 forthe case of the effect on the treated, and M0 for the untreated) are specified in the parentheses.2. The caliper choice is set such that only 100%, 80%, or 60% of matched pairs are qualified for the estimation of thetreatment effect. For example, for the case of 60%, matched pairs with a distance in terms of x exceeding the upper 60percentile of all matched pairs are discarded.3. The regressors in the benchmark gravity equation of Rose (2004) are used as the conditioning variables x for matching,with the treatment dummy variable being investigated (Bothin, Onein, or GSP ) suitably excluded from the list.4. In the column ‘permutation test’, the ‘effect’ sub-column presents the treatment effect estimate based on theD statistic; the p-value is obtained for the observed D statistic using the permutation test based on the normalapproximation approach; the CI is obtained by inverting the test procedure as discussed in the main text.5. In the column ‘signed-rank test’, the ‘effect’ sub-column presents the treatment effect estimate based on the Hodgesand Lehmann (1963) estimator; the p-value is obtained for the observed R statistic using the signed-rank test based onthe normal approximation approach; the CI is obtained by inverting the test procedure as discussed in the main text.6. In the column ‘sensitivity analysis’, the sensitivity analysis is conducted for the above signed-rank test based on asignificance level of α = 0.05 in a one-sided or two-sided test. R+ or R− (as a function of Γ) indicates the relevantdistribution on which the bound for the p-value of the signed-rank test is based. Γ∗ indicates the critical value of Γ atwhich the conclusion of the signed-rank test reverses.
37
Table 5: treatment effect – matching within yearpermutation test signed-rank test sensitivity analysis
one-sided test two-sided testcaliper effect p-value 95% CI effect p-value 95% CI Γ∗ as in Γ∗ as in
Both in GATT/WTO treatment effecton the treated (M1 = 114, 750):100% 1.329 0.000 [1.308, 1.350] 1.334 0.000 [1.314, 1.354] 2.433 R+ 2.427 R+
80% 1.075 0.000 [1.052, 1.098] 1.075 0.000 [1.053, 1.096] 2.086 R+ 2.081 R+
60% 0.836 0.000 [0.810, 0.862] 0.835 0.000 [0.810, 0.859] 1.780 R+ 1.775 R+
on the untreated (M0 = 21, 037):100% 0.340 0.000 [0.298, 0.381] 0.305 0.000 [0.267, 0.344] 1.251 R+ 1.245 R+
80% 0.239 0.000 [0.192, 0.286] 0.200 0.000 [0.157, 0.241] 1.144 R+ 1.138 R+
60% 0.185 0.000 [0.131, 0.239] 0.138 0.000 [0.090, 0.187] 1.084 R+ 1.077 R+
on all (M1 + M0 = 135, 787):100% 1.176 0.000 [1.157, 1.194] 1.164 0.000 [1.146, 1.181] 2.209 R+ 2.205 R+
80% 0.899 0.000 [0.878, 0.919] 0.883 0.000 [0.863, 0.902] 1.858 R+ 1.854 R+
60% 0.636 0.000 [0.613, 0.659] 0.619 0.000 [0.597, 0.640] 1.559 R+ 1.555 R+
One in GATT/WTO treatment effecton the treated (M1 = 98, 810):100% 0.761 0.000 [0.739, 0.782] 0.767 0.000 [0.748, 0.786] 1.751 R+ 1.747 R+
80% 0.564 0.000 [0.540, 0.588] 0.568 0.000 [0.547, 0.589] 1.525 R+ 1.521 R+
60% 0.422 0.000 [0.396, 0.449] 0.428 0.000 [0.405, 0.451] 1.397 R+ 1.393 R+
on the untreated (M0 = 21, 037):100% 0.032 0.054 [-0.007, 0.072] 0.038 0.014 [0.003, 0.071] 1.009 R+ 1.004 R+
80% 0.092 0.000 [0.048, 0.135] 0.089 0.000 [0.052, 0.126] 1.057 R+ 1.051 R+
60% 0.078 0.001 [0.028, 0.129] 0.084 0.000 [0.041, 0.127] 1.046 R+ 1.039 R+
on all (M1 + M0 = 119, 847):100% 0.633 0.000 [0.614, 0.652] 0.628 0.000 [0.611, 0.645] 1.605 R+ 1.601 R+
80% 0.443 0.000 [0.422, 0.464] 0.437 0.000 [0.418, 0.455] 1.401 R+ 1.397 R+
60% 0.324 0.000 [0.301, 0.347] 0.321 0.000 [0.301, 0.340] 1.297 R+ 1.293 R+
GSP treatment effecton the treated (M1 = 54, 285):100% 0.850 0.000 [0.830, 0.870] 0.791 0.000 [0.773, 0.810] 2.275 R+ 2.267 R+
80% 0.757 0.000 [0.736, 0.778] 0.696 0.000 [0.676, 0.716] 2.125 R+ 2.117 R+
60% 0.693 0.000 [0.668, 0.717] 0.627 0.000 [0.604, 0.649] 1.998 R+ 1.990 R+
Note:1. The pool of potential matches for an observation are restricted to observations with the opposite treatment and ofthe same year; observations without a match are discarded. The numbers of matched pairs obtained (M1 for the caseof the effect on the treated, and M0 for the untreated) are specified in the parentheses.2. The caliper choice is set such that only 100%, 80%, or 60% of matched pairs are qualified for the estimation of thetreatment effect. For example, for the case of 60%, matched pairs with a distance in terms of x exceeding the upper 60percentile of all matched pairs are discarded.3. The regressors in the benchmark gravity equation of Rose (2004) are used as the conditioning variables x formatching, with the year dummies and the treatment dummy variable being investigated (Bothin, Onein, or GSP )suitably excluded from the list.4. In the column ‘permutation test’, the ‘effect’ sub-column presents the treatment effect estimate based on theD statistic; the p-value is obtained for the observed D statistic using the permutation test based on the normalapproximation approach; the CI is obtained by inverting the test procedure as discussed in the main text.5. In the column ‘signed-rank test’, the ‘effect’ sub-column presents the treatment effect estimate based on the Hodgesand Lehmann (1963) estimator; the p-value is obtained for the observed R statistic using the signed-rank test based onthe normal approximation approach; the CI is obtained by inverting the test procedure as discussed in the main text.6. In the column ‘sensitivity analysis’, the sensitivity analysis is conducted for the above signed-rank test based on asignificance level of α = 0.05 in a one-sided or two-sided test. R+ or R− (as a function of Γ) indicates the relevantdistribution on which the bound for the p-value of the signed-rank test is based. Γ∗ indicates the critical value of Γ atwhich the conclusion of the signed-rank test reverses.
38
Table 6: treatment effect – matching within periodpermutation test signed-rank test sensitivity analysis
one-sided test two-sided testcaliper effect p-value 95% CI effect p-value 95% CI Γ∗ as in Γ∗ as in
Both in GATT/WTO treatment effecton the treated (M1 = 114, 750):100% 1.331 0.000 [1.310, 1.352] 1.336 0.000 [1.316, 1.356] 2.438 R+ 2.432 R+
80% 1.075 0.000 [1.052, 1.098] 1.075 0.000 [1.053, 1.096] 2.086 R+ 2.081 R+
60% 0.836 0.000 [0.810, 0.862] 0.835 0.000 [0.810, 0.859] 1.780 R+ 1.775 R+
on the untreated (M0 = 21, 037):100% 0.340 0.000 [0.298, 0.381] 0.305 0.000 [0.267, 0.344] 1.251 R+ 1.245 R+
80% 0.239 0.000 [0.192, 0.286] 0.200 0.000 [0.157, 0.241] 1.144 R+ 1.138 R+
60% 0.185 0.000 [0.131, 0.239] 0.138 0.000 [0.090, 0.187] 1.084 R+ 1.077 R+
on all (M1 + M0 = 135, 787):100% 1.177 0.000 [1.158, 1.196] 1.165 0.000 [1.147, 1.183] 2.213 R+ 2.208 R+
80% 0.899 0.000 [0.878, 0.919] 0.883 0.000 [0.863, 0.902] 1.858 R+ 1.854 R+
60% 0.636 0.000 [0.613, 0.659] 0.619 0.000 [0.597, 0.640] 1.559 R+ 1.555 R+
One in GATT/WTO treatment effecton the treated (M1 = 98, 810):100% 0.762 0.000 [0.741, 0.784] 0.768 0.000 [0.749, 0.787] 1.753 R+ 1.749 R+
80% 0.564 0.000 [0.540, 0.588] 0.568 0.000 [0.547, 0.589] 1.525 R+ 1.521 R+
60% 0.422 0.000 [0.396, 0.449] 0.428 0.000 [0.405, 0.451] 1.397 R+ 1.393 R+
on the untreated (M0 = 21, 037):100% 0.032 0.054 [-0.007, 0.072] 0.038 0.015 [0.003, 0.071] 1.009 R+ 1.004 R+
80% 0.092 0.000 [0.048, 0.135] 0.089 0.000 [0.052, 0.126] 1.057 R+ 1.051 R+
60% 0.078 0.001 [0.028, 0.129] 0.084 0.000 [0.041, 0.127] 1.046 R+ 1.039 R+
on all (M1 + M0 = 119, 847):100% 0.634 0.000 [0.615, 0.653] 0.629 0.000 [0.612, 0.646] 1.606 R+ 1.603 R+
80% 0.443 0.000 [0.422, 0.464] 0.437 0.000 [0.418, 0.455] 1.401 R+ 1.397 R+
60% 0.324 0.000 [0.301, 0.347] 0.321 0.000 [0.301, 0.340] 1.297 R+ 1.293 R+
GSP treatment effecton the treated (M1 = 54, 285):100% 0.851 0.000 [0.831, 0.871] 0.792 0.000 [0.773, 0.811] 2.276 R+ 2.269 R+
80% 0.757 0.000 [0.736, 0.778] 0.696 0.000 [0.676, 0.716] 2.125 R+ 2.117 R+
60% 0.693 0.000 [0.668, 0.717] 0.627 0.000 [0.604, 0.649] 1.998 R+ 1.990 R+
Note:1. The pool of potential matches for an observation are restricted to observations with the opposite treatment and ofthe same period, where the periods are: 1948 (Before Annecy round), 1949-1951 (Annecy to Torquay round), 1952-1956(Torquay to Geneva round), 1957-1961 (Geneva to Dillon round), 1962-1967 (Dillon to Kennedy round), 1968-1979(Kennedy to Tokyo round), 1980-1994 (Tokyo to Uruguay round), 1995-(After Uruguay round). Observations withouta match are discarded. The numbers of matched pairs obtained (M1 for the case of the effect on the treated, and M0
for the untreated) are specified in the parentheses.2. The caliper choice is set such that only 100%, 80%, or 60% of matched pairs are qualified for the estimation of thetreatment effect. For example, for the case of 60%, matched pairs with a distance in terms of x exceeding the upper 60percentile of all matched pairs are discarded.3. The regressors in the benchmark gravity equation of Rose (2004) are used as the conditioning variables x for matching,with the treatment dummy variable being investigated (Bothin, Onein, or GSP ) suitably excluded from the list.4. In the column ‘permutation test’, the ‘effect’ sub-column presents the treatment effect estimate based on theD statistic; the p-value is obtained for the observed D statistic using the permutation test based on the normalapproximation approach; the CI is obtained by inverting the test procedure as discussed in the main text.5. In the column ‘signed-rank test’, the ‘effect’ sub-column presents the treatment effect estimate based on the Hodgesand Lehmann (1963) estimator; the p-value is obtained for the observed R statistic using the signed-rank test based onthe normal approximation approach; the CI is obtained by inverting the test procedure as discussed in the main text.6. In the column ‘sensitivity analysis’, the sensitivity analysis is conducted for the above signed-rank test based on asignificance level of α = 0.05 in a one-sided or two-sided test. R+ or R− (as a function of Γ) indicates the relevantdistribution on which the bound for the p-value of the signed-rank test is based. Γ∗ indicates the critical value of Γ atwhich the conclusion of the signed-rank test reverses.
39
Table 7: treatment effect – matching within income-class combinationpermutation test signed-rank test sensitivity analysis
one-sided test two-sided testcaliper effect p-value 95% CI effect p-value 95% CI Γ∗ as in Γ∗ as in
Both in GATT/WTO treatment effecton the treated (M1 = 112, 959):100% 1.124 0.000 [1.103, 1.146] 1.097 0.000 [1.076, 1.118] 2.024 R+ 2.019 R+
80% 0.778 0.000 [0.753, 0.802] 0.734 0.000 [0.711, 0.757] 1.605 R+ 1.601 R+
60% 0.541 0.000 [0.514, 0.569] 0.504 0.000 [0.478, 0.530] 1.389 R+ 1.385 R+
on the untreated (M0 = 21, 013):100% 0.309 0.000 [0.268, 0.351] 0.274 0.000 [0.236, 0.312] 1.222 R+ 1.216 R+
80% 0.175 0.000 [0.128, 0.222] 0.131 0.000 [0.089, 0.173] 1.083 R+ 1.077 R+
60% 0.101 0.000 [0.047, 0.155] 0.059 0.008 [0.010, 0.107] 1.015 R+ 1.009 R+
on all (M1 + M0 = 133, 972):100% 0.997 0.000 [0.977, 1.016] 0.955 0.000 [0.936, 0.973] 1.883 R+ 1.880 R+
80% 0.662 0.000 [0.640, 0.684] 0.612 0.000 [0.592, 0.633] 1.507 R+ 1.504 R+
60% 0.442 0.000 [0.418, 0.467] 0.402 0.000 [0.379, 0.424] 1.313 R+ 1.309 R+
One in GATT/WTO treatment effecton the treated (M1 = 98, 363):100% 0.650 0.000 [0.627, 0.672] 0.628 0.000 [0.608, 0.648] 1.556 R+ 1.552 R+
80% 0.476 0.000 [0.452, 0.500] 0.460 0.000 [0.438, 0.481] 1.394 R+ 1.391 R+
60% 0.342 0.000 [0.315, 0.370] 0.324 0.000 [0.300, 0.347] 1.267 R+ 1.263 R+
on the untreated (M0 = 21, 013):100% 0.049 0.007 [0.010, 0.087] 0.034 0.023 [0.000, 0.067] 1.006 R+ 1.001 R+
80% 0.063 0.002 [0.020, 0.105] 0.051 0.003 [0.014, 0.087] 1.020 R+ 1.014 R+
60% 0.034 0.084 [-0.014, 0.083] 0.028 0.092 [-0.012, 0.070] 1.007 R− 1.013 R−on all (M1 + M0 = 119, 376):100% 0.544 0.000 [0.524, 0.563] 0.512 0.000 [0.495, 0.530] 1.455 R+ 1.452 R+
80% 0.391 0.000 [0.369, 0.412] 0.369 0.000 [0.350, 0.388] 1.320 R+ 1.316 R+
60% 0.215 0.000 [0.192, 0.239] 0.200 0.000 [0.179, 0.219] 1.164 R+ 1.161 R+
GSP treatment effecton the treated (M1 = 53, 811):100% 0.732 0.000 [0.712, 0.752] 0.693 0.000 [0.674, 0.712] 2.018 R+ 2.011 R+
80% 0.588 0.000 [0.567, 0.608] 0.551 0.000 [0.531, 0.570] 1.813 R+ 1.807 R+
60% 0.507 0.000 [0.485, 0.530] 0.474 0.000 [0.452, 0.496] 1.706 R+ 1.699 R+
Note:1. The pool of potential matches for an observation are restricted to observations with the opposite treatment andof the same income-class combination, where the income-class combinations are: ‘low income-low income’ country-couples, ‘low income-middle income’ country couples, ‘low income-high income’ country couples, ‘middle income-middleincome’ country couples, ‘middle income-high income’ country couples, and ‘high income-high income’ country couples.Observations without a match are discarded. The numbers of matched pairs obtained (M1 for the case of the effect onthe treated, and M0 for the untreated) are specified in the parentheses.2. The caliper choice is set such that only 100%, 80%, or 60% of matched pairs are qualified for the estimation of thetreatment effect. For example, for the case of 60%, matched pairs with a distance in terms of x exceeding the upper 60percentile of all matched pairs are discarded.3. The regressors in the benchmark gravity equation of Rose (2004) are used as the conditioning variables x for matching,with the treatment dummy variable being investigated (Bothin, Onein, or GSP ) suitably excluded from the list.4. In the column ‘permutation test’, the ‘effect’ sub-column presents the treatment effect estimate based on theD statistic; the p-value is obtained for the observed D statistic using the permutation test based on the normalapproximation approach; the CI is obtained by inverting the test procedure as discussed in the main text.5. In the column ‘signed-rank test’, the ‘effect’ sub-column presents the treatment effect estimate based on the Hodgesand Lehmann (1963) estimator; the p-value is obtained for the observed R statistic using the signed-rank test based onthe normal approximation approach; the CI is obtained by inverting the test procedure as discussed in the main text.6. In the column ‘sensitivity analysis’, the sensitivity analysis is conducted for the above signed-rank test based on asignificance level of α = 0.05 in a one-sided or two-sided test. R+ or R− (as a function of Γ) indicates the relevantdistribution on which the bound for the p-value of the signed-rank test is based. Γ∗ indicates the critical value of Γ atwhich the conclusion of the signed-rank test reverses.
40
Recommended