Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Untangling Selection Effects in Studies of Coercion
Eugene Gholz Assistant Professor
Lyndon B. Johnson School of Public Affairs University of Texas
Daryl Press Associate Professor
Department of Political Science University of Pennsylvania
[email protected] Abstract: For more than a decade, scholars have recognized that studies of coercion are plagued by selection effects. Analyses that fail to account for the strategic decisions that lead countries to initiate (or not) crises will yield biased results. Unfortunately, recent attempts to improve research design to account for selection effects are flawed. We use a formal model to demonstrate that selection effects create complex, non-monotonic relationships between key parameters (e.g., defender interests and power) and observable crisis outcomes. Whether the real connections between these variables and deterrence are positive, negative, or non-monotonic, scholars will observe complex non-monotonic relationships in datasets of crisis dynamics, greatly complicating empirical analyses of coercion. We describe a better set of research approaches – both quantitative and qualitative – that scholars can use to mitigate the problems of selection effects as they study coercion. We provide two short case studies to illustrate how the recommendations for qualitative research can be carried out.
DRAFT – DO NOT CITE 1
For years scholars of international politics have labored to test theories of
coercion. The main body of this research uses data on crisis outcomes to determine
whether any of a host of variables – such as formal alliances, domestic political
institutions, public statements by leaders, military force balances – affect the odds of
successful deterrence or compellence.1 The goal is to identify factors that affect the
likelihood of wars (that is, to understand deterrence), to discover what makes countries
accede to adversaries' demands (that is, to understand compellence), and to understand
when countries issue explicit threats or resort to brinkmanship to pursue their foreign
policy goals (that is, to understand crisis initiation).
Selection effects greatly complicate efforts to test theories of coercion, because
they make the easily observable cases of crisis interaction a non-representative sample of
countries. The problem arises because many potential challengers are deterred before
they initiate a crisis, and many defenders cede the issue at hand rather than take a stand
they are likely to abandon later. Datasets of crises, in which by definition both
challengers and defenders have forgone opportunities to concede, over-represent the most
motivated challengers and defenders.2 The implication for international relations
scholarship is profound: the statistical relationships that are observable in the data on
crisis outcomes may differ significantly from the actual relationships that exist between
key variables and successful coercion.3
In this paper we demonstrate that several apparently plausible research strategies
for studying coercion despite the selection effects do not work. James Fearon proposes
reversing the sign of the expected empirical relationships; Paul Huth and Todd Allee 1 For a useful review, see Huth 1999. 2 Fearon 1992. 3 Smith 1999.
DRAFT – DO NOT CITE 2
recommend using an off-the-shelf selection estimator like the Heckman probit.4 We use a
formal model of crisis dynamics to show the selection effect more precisely than previous
work. The selection effect is considerably more complex than scholars have assumed:
under some conditions, selection effects strongly influence the types of countries that
appear in observable datasets of crisis interactions, but under other conditions, the
selection effect is relatively weak. Our model reveals how changes in parameter values
affect both the real rate of successful coercion and the rate that scholars can observe in
datasets of crisis outcomes. Overall, the selection effect should lead scholars to observe a
non-monotonic relationship between key independent variables and deterrence success,
whether the underlying real relationship is positive, negative, or non-monotonic.
The results of our model should not be confused with previous work on non-
monotonic relationships in crisis dynamics.5 For example, according to some models of
crisis dynamics, the relationship between the balance of power and the probability of war
should be non-monotonic, because the identity of “challengers” and “defenders” is
endogenous. As weak defenders become more powerful, war becomes less likely until
the “defender” becomes so strong that it might actually choose to start a war, in essence
becoming a challenger. Models that incorporate this dynamic, though, address the real
relationship between power and the probability of war, and their authors assume in their
attempts to empirically test them that the real relationship maps in a straightforward way
to the relationships that we can observe in datasets on crisis outcomes. The implication
of our model is that this assumption is not correct, hence their attempts to empirically test
their models are suspect. Whatever real relationship exists between the balance of power
4 Fearon 1994; Huth and Allee 2004. 5 Bueno de Mesquita, Morrow, and Zorick 1997.
DRAFT – DO NOT CITE 3
or signals of interest and successful coercion – whether linear or not, monotonic or not –
the observable relationship is complex and non-monotonic.6
The problems that we identify with the current approach to studying coercion are
not quibbles. They raise fundamental questions about a decade of empirical scholarship
on coercion. Variables that actually have a powerful effect on the real rate of deterrence
success, or the real rate of success at compellence, may have been overlooked. Even
worse, mistakes in interpreting selection effects and the predictions of models of coercion
can lead to dangerous inferences that are exactly opposite of the truth.
We offer suggestions to improve research design in studies of coercion using both
quantitative and qualitative methods. One promising route is for scholars to use
statistical estimators that are derived directly from the payoffs and structure of a crisis
game.7 Custom estimators like this would serve as alternatives to the off-the-shelf
estimators like probit and logit that are widely used in the international relations
literature. This approach directly captures the effects of strategic interaction (and hence
internalizes the selection effect) while maintaining the traditional advantages of statistical
research. The downside of this research design is that it places high demands on data
quality and on scholars' confidence in their precise model specification.
6 Bueno de Mesquita, Morrow, and Zorick 1997 use a quadratic term in a logit model to test their model's prediction of a non-monotonic relationship between the balance of power and the probability of war. Signorino 1999 argues that their choice of logit (shared by many other scholars in their studies of crises) is not appropriate for datasets that are generated through strategic interactions. However, Signorino's tests do not help us to understand whether logit fails due to the underlying strategic relationship, due to the complex selection effect, or both. Signorino 1999 shows that there is a problem with off-the-shelf estimators, but he cannot explain the source of the problem. This paper shows the crisis dynamics much more clearly. Our suggestions for improving research design complement Signorino's suggestions. 7 Lewis and Schultz 2003. See also Signorino 1999; Smith 1999.
DRAFT – DO NOT CITE 4
A second promising approach using large-n datasets is for scholars to replace
crisis outcomes with crisis initiation as their dependent variable.8 This research design
mitigates problems caused by selection effects, and the relationship between key
independent variables and the decision to issue a challenge should at least be monotonic,
but this approach has its own limitations: it allows scholars to answer only some of the
important questions about coercion, and it does not solve all selection effects problems.
Although neither of these two quantitative approaches is perfect, each can be used to
strengthen scholars' understanding of coercion.
The third promising research approach is almost universally overlooked as a
method for mitigating selection effects: case studies. The central problem in studying
coercion results from the indeterminate relationship between actual rates of successful
coercion and observable patterns of crisis outcomes. To avoid that problem, scholars can
study the process of decision making rather than the outcomes of crises. Scholars can use
archival material to directly observe how leaders assess the seriousness of a given threat.
What did the leaders discuss? Did they debate the significance of the adversary’s
domestic political institutions, note the existence or absence of formal alliances, and
believe the adversary’s public statements? Or did other factors weigh more heavily
during their deliberations? Of course, selection effects will still lurk in the backgrounds
of these case studies: crises will involve a non-representative sample of countries. But
the selection effect will have a less pernicious impact because inferences are not being
8 See, for example, Leeds 2003.
DRAFT – DO NOT CITE 5
drawn from the frequency of coercive success but from the types of information that
leaders focus on, discuss, and debate during crises.9
The remainder of this paper is divided into four sections. The first section
describes the canonical model of military deterrence crises and explains why it is
susceptible to selection effects. The second section shows that the implications of the
multi-stage model for crisis outcomes are more complex than past work has revealed and
details the problems that the complexity poses for empirical studies. The third section
demonstrates how case studies can mitigate the problems of selection effects. The
conclusion emphasizes the importance of the full specification of the formal model, the
necessity of carefully linking the design of statistical analyses to the formal model, and
the value of careful case studies for better tests of theories of coercion.
Selection Effects in Studies of Deterrence
Scholars usually model deterrence interactions as occurring in two stages: general
deterrence and immediate deterrence. General deterrence takes place during non-crisis
periods when one country (a challenger) considers threatening a second country (the
protégé); typically a third country in the model (the defender) may come to the protégé’s
aid.10 General deterrence succeeds—invisibly—when prospective challengers decide not
to threaten the protégé; if a threat is issued, general deterrence has failed. Once a protégé 9 Case studies have been criticized specifically for their vulnerability to selection effects: researchers may choose non-representative or biased cases (King, Keohane, and Verba 1994). These critics are correct, and case study researchers must choose their cases carefully. But this researcher-induced selection bias is separate from the selection effect introduced by the strategic behavior of countries (Collier, Mahoney, and Seawright 2004). Properly chosen case studies can dramatically mitigate the impact of the selection effects that plague datasets on crises. 10 Deterrence theorists distinguish between direct and extended deterrence. Direct deterrence refers to efforts to prevent attacks on oneself; extended deterrence is an effort to prevent attacks on others. The text describes extended deterrence situations, but this argument applies to direct deterrence, too. In those cases, the issue over which the challenger threatens can be considered the "protégé.”
DRAFT – DO NOT CITE 6
is threatened, the defender must decide whether to respond by issuing a deterrent warning
or by quietly conceding. If the defender promises to come to the protégé's aid, an
immediate deterrence crisis begins. A subsequent attack by the challenger would
constitute a failure of immediate deterrence; if the challenger instead refrains from
attacking (backs down from its earlier threat), then immediate deterrence has succeeded.11
Rational deterrence theory suggests that a powerful defender with a strong interest
in a given protégé should be more effective at deterring attacks. Early quantitative
analyses of deterrence crises applied that theory.12 In their empirical tests, scholars
expected to find a positive correlation between immediate deterrence success (i.e., a
challenger's decision to back down during a crisis) and variables that reflect the
defender's capabilities and interest in the protégé. We call this expected positive
relationship the "traditional prediction" of deterrence theory.
The straightforward approach for testing theories of deterrence was challenged by
scholars, notably including James Fearon, who realized that selection effects distort the
easily observable data on crisis dynamics.13 To make valid inferences, scholars must
consider the choices that challengers and defenders make before crises begin. Other
things being equal, countries are more likely to threaten protégés if potential defenders
are too weak to resist effectively or are likely to give in without a fight. Only highly
11 Some critics of the model (e.g., Lebow and Stein 1989 and 1990) question whether the absence of a threat necessarily reflects general deterrence success and whether a challenger’s decision to back down during a crisis means that immediate deterrence necessarily succeeded. Sometimes countries—even those that have issued threats—have no interest in attacking, irrespective of any deterrence calculations. These critics raise important points. In this paper, however, we limit ourselves to critiquing the logic behind the research design of many empirical studies of coercion. For that purpose, we assume that investigators can correctly define the boundaries of each observation in their datasets. Later in this paper, we argue that careful case studies can avoid the problems of selection effects; they can also reduce the danger of misidentifying a challenger’s lack of interest as deterrence. 12 See, for example, Huth and Russett 1984 and 1990; Huth, Gelpi, and Bennett 1993. 13 The seminal works are Fearon 1992, 1994, and 2002.
DRAFT – DO NOT CITE 7
motivated challengers (i.e., those who would rather fight than accept the status quo) will
threaten protégés in the sphere of powerful and credible defenders. In datasets of crises,
therefore, challenger motivation will be correlated with defender credibility and power.
The implication of this argument is both counter-intuitive and profound: the defenders
who look most fearsome will usually fail at deterrence during crises. Credible and
capable defenders might succeed at general deterrence, but scholars only observe
immediate deterrence in datasets of crisis outcomes. Consequently, when scholars
estimate the relationship between variables that reflect the defender's capabilities and
credibility and immediate deterrence success, they should expect to find a negative
correlation. We call this the "selection effects prediction" of deterrence theory.14
Insert Figure 1
The selection effects argument is clearer when it is illustrated formally. The
model in figure 1 depicts a deterrence crisis in four stages and describes the payoffs that
the challenger and defender receive for each outcome.15 First a potential challenger
decides whether to threaten a protégé or simply accept the status quo; the value of the
14 Fearon 1994 expects a negative correlation between immediate deterrence success and variables that reflect defender interest but a positive correlation with variables that reflect defender power. This distinction is based on the assumption that although challengers update their assessments of defender interest during crises, they do not learn about defender power. We disagree; challengers can learn about both defender interest and power from crisis behavior. Therefore, the most internally consistent version of the “selection effects prediction” would treat interests and power similarly: expecting both variables to be inversely related to immediate deterrence success in crises. We use the latter version of the selection effects prediction, but our model results in the next section contradict both versions. 15 The model is based on Fearon 1992 with modifications explained in the text and the appendix. This four-stage model allows for uncertainty about both the challenger and the defender: the power and / or interests of both actors can be modeled as private information, so that each must act based on his predictions about what his adversary will do. Other scholars (e.g., Schultz 1999; Lewis and Schultz 2003) have used three-stage models or even a two-stage model (Signorino and Yilmaz 2003) because they are simpler to solve mathematically yet still demonstrate the importance of strategic behavior for empirical studies of coercion. However, these simpler models eliminate the possibility of defender bluffs and therefore diverge from reality. That choice substantively affects the calculations of forward-looking challengers in the model. The four-stage model is the simplest that captures challengers' and defenders' simultaneous incentives to misrepresent their motivations and capabilities (the real-world situation).
C
D
C
D
Thre
aten
Don
’t Th
reat
en
Atta
ckD
on’t
Atta
ck
Mob
ilize
Don
’t M
obili
ze
Figh
tD
on’t
Figh
t
Stat
us Q
uo(0
, 0)
Def
. Acq
uies
ces
(AC, -
AD)
Cha
l. B
acks
Dow
n(-
RC, R
D)
Def
. Bac
ksD
own
(AC+R
C, -A
D-R
D)
War
(1-p
)*(A
C) -
F C,
(1-p
)*(-
AD) -
FD
Figu
re 1
: A F
our
Stag
e D
eter
renc
e En
coun
ter
C =
Chal
leng
erD
= D
efen
der
Not
e: P
ayof
fs a
re d
escr
ibed
in te
xt.
DRAFT – DO NOT CITE 8
status quo is normalized to zero for both countries. If the challenger decides to threaten
and the defender concedes (chooses “not mobilize”), the challenger seizes the protégé
and gains AC; the defender loses AD. AC and AD represent the challenger and defender's
levels of interest in the protégé. If, on the other hand, the defender mobilizes, the
challenger must decide whether to back down or attack. Backing down, however, is not
free. The challenger would suffer an audience cost equal to –RC if he were to retreat from
his threat; the defender would enjoy a foreign policy victory, receiving RD. If the
challenger does attack, the defender has a final choice to make: he can back down, or he
can carry through on his deterrent threat and fight to defend the protégé. Backing down
entails the loss of the protégé and also an audience cost (-AD-RD), while the challenger
gets AC+RC for seizing the protégé and defeating the defender. The final outcome arises
if the defender decides to fight: both the challenger and defender receive their expected
value for war, which is a function of the probability that the defender will win (p), the
value of the protégé, and the cost of fighting (FD for the defender and FC for the
challenger).16 For the defender, the expected value for war reduces to (1-p)(-AD) – FD;17
the challenger receives (1-p)(AC) – FC for fighting.
16 The original formulation in Fearon 1992 uses a composite "value for war" parameter rather than separating the power-related variables (probability of winning the war and the cost of fighting) from the interest variables (AC and AD). This conflation of power and interests makes it difficult to follow the mechanisms by which real-world independent variables affect outcomes. For example, researchers interested in the effect of democracy on international relations have suggested that democracy might make countries more sensitive to the costs of war (reducing the value for war by increasing F in our model) and that democracies might be more likely to win their wars (increasing the value for war by changing p in our model). A composite “value for war” parameter complicates efforts to test these theories. Furthermore, each country’s value for war presumably is directly related to the value that it assigns to the protégé, yet the composite "value for war" specification does not take that into account. Finally, using the value for war as the outcome payoff at the bottom of the tree hides a relationship between the challenger's payoffs and the defender's payoffs: the probability that the challenger will win a war is just one minus the probability that the defender will win the war (not accounting for ties), meaning that the payoffs at the bottom of the tree should be correlated. Fearon 1997 revises the payoff for war along the lines used here, but other recent efforts to model crisis interaction have continued to use the less transparent formulation (Schultz 1999; Lewis and Schultz 2003).
DRAFT – DO NOT CITE 9
With complete information, the game tree in Figure 1 has only three possible
outcomes: the status quo, defender acquiesces, and war. If a defender is unwilling to
fight (and the challenger knows it), the challenger will always threaten, and the defender
will always acquiesce (i.e., "not mobilize"). If, on the other hand, a defender values the
protégé highly enough or is powerful enough to have a high expected value for war, both
the challenger and the defender will know that the defender will fight if the challenger
attacks. A challenger then would have only two options: either 1) accept the status quo
or 2) “threaten,” “attack,” and fight a war over the protégé. Under complete information,
therefore, immediate deterrence never succeeds. Immediate deterrence can only succeed
when a challenger either bluffs or probes – impossible strategies with complete
information.18
Immediate deterrence is possible, however, if there is incomplete information.
Challengers and defenders may not know their relative military capabilities (p) or the cost
of a war (FC and FD). More often in studying crisis behavior, scholars have focused on
incomplete information about interests: a country can never be sure about the value its
adversary places on a given protégé.19 For example, not knowing the true level of the
defender's interest creates an incentive for unmotivated challengers to initiate crises as a
way to gain information—all the while knowing that they will back down if the defender
mobilizes. The challenger's goal in issuing the threat is to find out if the defender cares 17 The defender’s payoff for war is [p*(0) + (1-p)*(-AD)] – FD, which reduces to the expression in the text. 18 A bluff is a threat that a country knows it will not carry out if an adversary issues a counter-threat. A probe is a threat whose execution depends on the intensity of the adversary's response. 19 Huth 1999. Many empirical articles try to estimate the importance of various signals of interest. For example, does signing a formal alliance treaty increase the defender's credibility, hence increasing deterrence? Do public statements by leaders (like President Kennedy's famous "Ich bin ein Berliner") increase credibility and deterrence? Do tripwire deployments of troops that are too small to affect the probable outcome of a war send a strong signal of defender interest? These and other independent variables (e.g., measures of a defender's intrinsic interest in a protégé) are all presumed to correlate with a challenger's estimate of the defender's level of interest in a protégé.
DRAFT – DO NOT CITE 10
enough about the protégé to be willing to pay the cost of mobilizing; if the defender is
willing to pay that cost, then the challenger can update and increase its assessment of the
probability that the defender also cares enough about the protégé to be willing to fight for
it.20 In the model with uncertainty, immediate deterrence succeeds when a challenger
who is simply probing or bluffing encounters a defender who mobilizes and is relatively
likely to be willing to fight.
The formal model described above can incorporate the assumption of incomplete
information, thereby demonstrating the selection effect. If a challenger does not know
how much the defender values the protégé (i.e., the actual value of AD), it must use its
best estimate of the defender's level of interest, K, to calculate the expected value of a
bluff or probe.21 When K is big, the expected value of a bluff is small because the
defender appears relatively likely to mobilize and fight.22 Therefore in datasets of crises,
a credible defender (e.g., high K) is unlikely to be paired with a bluffer; a less credible
defender could face either a bluffer or a highly motivated challenger. Because the rate of
immediate deterrence is determined by the ratio of bluffers to committed challengers,
20 It is likely that challengers and defenders also gain information about the power variables during a crisis. For example, a challenger might learn how much of the defender's military it was willing to deploy to the protégé country, how smoothly the defender's troops were mobilized, and how many of the defender's allies were willing to mobilize, too. All of that information would allow the challenger to update its assessment of the probability that it would win a war over the protégé. The model could be readily recast to make p rather than AD and AC the incomplete information parameter. 21 Many of the tools of statecraft available to defenders in extended deterrence situations are ways of signaling their level of interest, thereby affecting K. For example, signing a mutual defense pact presumably would increase the value of K. 22 Challengers only benefit by bluffing when the defender chooses “not mobilize.” But as K increases, the challenger believes that “fight” becomes more attractive to the defender relative to surrendering the protégé. The derivative of the payoff for “fight” with respect to AD is greater than the derivative of the payoffs of both “don’t mobilize” and “don’t fight" (that is, p-1 > -1).
DRAFT – DO NOT CITE 11
actions that increase K should correlate negatively with the observed rate of immediate
deterrence success – the selection effects prediction.23
In sum, recent scholarship on deterrence, increasingly attuned to selection effects,
argues that previous analyses systematically misinterpreted their data on deterrence. But
the studies that model selection effects offer good news: if we consider the strategic
behavior that led countries into crises, we can correct our interpretation of statistical tests
of deterrence theory. Specifically, those attributes of a crisis that correlate with
immediate deterrence failure should be emulated by potential defenders, because they are
successfully screening out all but the highly motivated (undeterrable) challengers before
crises even begin. In other words, the results that the early studies of deterrence
produced should be reversed; the signs predicted for coefficients in deterrence theory
regressions should be "flipped." This simple correction allegedly helps us to understand
the true relationship between various independent variables and deterrence success.
A More Complete View of the Multi-Stage Model
By recognizing the danger of selection effects in data on deterrence, scholars have
identified a critical flaw in early studies. Unfortunately, scholars have drawn the wrong
empirical predictions from the multi-stage model of deterrence, leading to incorrect
interpretations of data on crisis outcomes. The signs of the coefficients of estimated
relationships between independent variables and immediate deterrence success may be
misunderstood, and for many samples, the estimates will be biased toward zero.
23 The reason that unmotivated challengers usually do not threaten a highly credible defender – that is, the reason for the selection effect – is that there are costs associated with backing down during a crisis (RC and RD). If prospective attackers faced no costs from backing down, then there would be no selection effect. On audience costs, see Schultz 2001; Fearon 1997.
DRAFT – DO NOT CITE 12
The central problem is that the traditional predictions — i.e., that greater defender
power, interest in the protégé, and credibility make immediate deterrence success more
likely — and the selection effects predictions — i.e., the exact reverse — are both
sometimes correct.24 In other words, actions that strengthen general deterrence (reduce
the number of challenges) will sometimes cause the observed probability of immediate
deterrence successes to increase, and other times the same actions will cause the rate of
immediate deterrence success to decline. Scholars will find it very difficult to determine
a priori which situation applies for a given sample — or whether each situation applies
for a subset of the data. The result is that deterrence theory makes no determinate
predictions about patterns of immediate deterrence success in scholars' datasets, and
scholars cannot test specific hypotheses about coercion (e.g., whether local military force
advantages bolster deterrence) by drawing straightforward inferences from patterns of
crisis outcomes.
A detailed look at the effects of an increase in defender credibility shows its two
countervailing effects on the likelihood of immediate deterrence. According to the logic
of the selection effects prediction, it reduces the frequency of immediate deterrence
success by reducing the expected value of bluffing and therefore the pool of bluffers who
decide to issue threats. At the same time, though, an increase in defender credibility also
reduces the expected value of attacking after the defender mobilizes, because it seems
more likely to the challenger that the defender will fight. As a result, some challengers
that might have been willing to attack against a less-credible defender instead will only
24 In his excellent article on the democratic peace, Schultz 1999 notes in passing a non-monotonic relationship between key variables in his model and war likelihood. Schultz’s model is not intended to capture the complete dynamics of deterrence crises (e.g., for simplicity he omits the stage in which defenders can bluff); however, his results are generally consistent with our finding. See also Lewis and Schultz 2003.
DRAFT – DO NOT CITE 13
probe, and if they face defenders who actually mobilize, those challengers will back
down. Against a less credible defender, they would have been undeterrable, but the
increase in defender credibility turned them into examples of immediate deterrence
success. In sum, increasing defender credibility both reduces and increases the number
of bluffers in the observable dataset.
The net effect of changes in credibility on immediate deterrence success depends
on the relative magnitude of the two effects. If, for example, challenger audience costs
(RC) are very big, then the pool of bluffers should shrink rapidly when defender
credibility rises; in this case the selection effect prediction is correct, and increasing
defender credibility will correlate with less immediate deterrence. But if audience costs
are small, the pool of challengers who actually plan to attack may shrink more quickly
than the pool of bluffers, in which case increases in defender credibility will lead to more
immediate deterrence success. The other parameters in the game tree – i.e., the costs of
fighting, the probability of defender victory, and the baseline level of challenger and
defender interest in the protégé – similarly affect the responses of bluffers and committed
attackers to an increase in defender credibility, changing the relative composition of the
observed pool of challengers in a dataset. The net effect on the predicted correlation
between defender credibility and immediate deterrence success is ambiguous.
Insert figure 2
Figure 2 demonstrates the countervailing effects graphically. The line depicts the
range of potential values for AC, the challenger's interest in the protégé, from the lowest
possible interest at the left to the highest at the right. We assume that the defender's pre-
crisis estimate of the challenger's interest (J) lies in the center of the range of possible
Figu
re 2
: Ind
iffer
ence
Poi
nts b
etw
een
Cha
lleng
er S
trat
egie
s
AC
AC1
AC2
-∞∞
Don
’t Th
reat
enTh
reat
en/D
on’t
atta
ckTh
reat
en/a
ttack
Chal
leng
erSt
rate
gy
Chal
leng
erIn
diffe
renc
epo
ints
Cha
lleng
er: S
tatu
s quo
seek
erB
luff
erM
otiv
ated
atta
cker
J-α
J+α
DRAFT – DO NOT CITE 14
values and that the defender's uncertainty about his estimate (α) correctly delimits the
width of the interval of possible levels of challenger interest.25 Two particular points are
indicated in the figure: AC1 is defined as the value for AC at which a challenger is
indifferent between adopting the strategies “not threaten” and “threaten/not attack”.26
The actual value of AC1 can be calculated in terms of the other payoffs on the figure 1
game tree (see the appendix). Similarly, AC2 is the value of AC at which a challenger is
indifferent between the strategies of “threaten/not attack” and “threaten/attack.” In the
figure, the probability of immediate deterrence success is the ratio of the distance
between AC1 and AC2 to the distance between AC1 and J+α.
An action taken by a defender prior to a crisis that increases its credibility (e.g.,
something that increases K) has two effects on potential challengers' calculations. First,
because the action makes the defender appear more likely to mobilize, the incentive for a
challenger to bluff declines. AC1, therefore, moves to the right.27 This shift of AC1 is why
the selection effects literature argues that credible defenders deter most “bluffers” from
issuing a threat. But an increase in defender credibility also means that the defender is
more likely to fight for the protégé rather than choose “not fight” after a challenger
attacks. Therefore, only a highly motivated challenger will actually attack when facing a
credible defender. In figure 2, AC2 moves to the right as defender credibility increases.
25 In this model α is public knowledge. We use α to reflect both the defender’s uncertainty about the challenger’s actual interest in the protégé and the challenger’s uncertainty about the defender’s true level of interest. 26 A challenger would choose the strategy “threaten/not attack” in the hope that the defender would not mobilize. 27 One way to think of this is that as defender credibility increases, the marginal bluffer decides that bluffing is not worth it — i.e., as defender credibility increases, it takes a greater value of AC to make a challenger indifferent between “not threatening” and bluffing (“threaten/not attack”).
DRAFT – DO NOT CITE 15
Unless one knows the relative distance that AC1 and AC2 shift as defender
credibility increases, one cannot determine the net effect on the probability of immediate
deterrence success. If AC1 shifts more quickly than AC2, the proportion of bluffers in
crises will drop, and successful immediate deterrence will become less common. This is
the selection effects prediction. But if AC2 shifts more quickly, rising credibility will
increase the likelihood of immediate deterrence success, as suggested by the traditional
prediction.
The problem for scholars who study deterrence is that the net effect of increases
in credibility on deterrence outcomes depends on the precise values of the other
parameters in the game tree. The appendix demonstrates the complexity of these
relationships. For a wide range of values for the magnitude of audience costs, costs of
fighting, probability of defender victory in a war, and uncertainty in the adversaries'
predictions of each other's level of interest in the protégé, we can choose values of the
other variables such that an increase in defender credibility will either increase or
decrease the probability of immediate deterrence success. Without very precise
measurements of all of these variables, scholars cannot know whether deterrence theory
predicts a positive or negative correlation between actions that signal greater defender
interest in its protégé and immediate deterrence success.
Insert figures 3
Figure 3 shows the rate of immediate deterrence success as a function of the
defender’s power (p) and its apparent interest in the protégé (K) under a range of
circumstances. Panel 1 shows the relationship between K and IDS for a range of
Figu
re 3
: Com
plex
Rel
atio
nshi
ps b
etw
een
pow
er (p
), in
tere
sts (
K),
and
imm
edia
te d
eter
renc
e su
cces
s (ID
S)
DRAFT – DO NOT CITE 16
parameter values that might describe “typical” cases.28 Panel 2 shows the relationship
between p and IDS for the same sets of parameters. Panel 3 illustrates the relationship
between K and IDS in a defense dominant world, meaning that the expected costs of
fighting are greater for the challenger than the defender. Panel 4 presents the relationship
between p and IDS in a rapacious world: with these values, bluffing is rampant because
the cost of backing down is low, and war is common because the cost of fighting is much
smaller than the potential spoils of victory. In all four panels, the relationship between
the variable of interest and IDS is non-monotonic and quite complex.29
There are three key points to take from these graphs. First, an increase in K or p
can result in either a substantial increase or decrease in the rate of immediate deterrence
success. Therefore, steps that a defender takes that successfully signal its interest in a
protégé or its ability to successfully defend a protégé could generate either higher or
lower rates of observable deterrence success (IDS). Second, for some parameter values
(e.g., FC=9 in panel 1 and RC=5 in panel 3), the relationship between the defender’s
apparent interest, K, and immediate deterrence success is essentially flat for wide ranges
of parameter values, meaning that even when deterrence is working (i.e., challengers are
threatening and attacking less often than they would have at lower levels of K), no
evidence of this successful deterrence will appear in data on crisis outcomes. Finally, the
relationship between K and IDS is a function of the size of K; as K varies, the
28 We consider these values to be “typical” because audience costs are smaller than the costs of fighting (except for the FC=3 line), and because the costs of fighting are smaller than the value of the protégé (except for the lowest values of K in Panel 1). Scholars may disagree about what constitute typical values, but the general shapes of these lines appear with a range of parameter values. 29 Most of the curves end before reaching the right limit of the graph (in Panel 3 the curves end at values of K that range from 18.0 to 18.8, though this is difficult to see). This occurs because for some parameter values there are no crises. For example, if for a given set of parameters everyone knows that even the most interested defender is unwilling to fight (AD2>K+∝), then all challengers will threaten but no defenders will mobilize, so there will be no crises.
DRAFT – DO NOT CITE 17
relationship between K and IDS changes.30 Similar results can be seen in the panels
relating IDS to p.
These graphs illustrate a serious problem for analyses that attempt to draw
inferences about theories of coercion by observing immediate deterrence outcomes. For
example, studies that regress immediate deterrence success on either indicators of a
defender’s interest in a protégé or indicators of a defender’s power will not produce
meaningful results.31 If a study assumes that the selection effects prediction is correct but
inadvertently samples cases in which the actual relationship between K (or p) and
immediate deterrence success is positive, the analysis will tend to fail theories that are
correct and possibly confirm those that are wrong. If the sample comprises observations
in which the relationship between K or p and IDS is relatively flat, then variables that
have great significance as causes of K or p—and hence great significance for coercion—
will appear to be irrelevant. And if the study happens to examine a sample that includes
both “positive correlation” and “negative correlation” cases (for example, cases that cross
over a local maximum of the probability of immediate deterrence success), the estimated
coefficient relating defender interest or power to immediate deterrence success will be
biased towards zero. These latter two cases will be statistically indistinguishable from
30 If the relationship between K and IDS were always concave down, scholars could execute a weak test of various theories of deterrence using a quadratic specification in a regression equation: the squared term on the relationship between K and IDS should never have a positive coefficient. Unfortunately for some parameter values (e.g., the left part of the Fc=3 curve in Panel 1), the relationship is concave up. We thank Bear Braumoeller for discussion of this point. 31 Signorino 1999 also shows that strategic interaction between countries can cause problems for attempts to estimate crisis models, specifically for studies using logit (and probit). Signorino's paper draws on a version of the Bueno de Mesquita et al. model: the true relationship between power and the probability of war is non-monotonic because of the endogeneity of the identity of “challengers.” His results are based on a complete information game (with uncertainty about crisis outcomes generated because each country is assumed to make errors in its strategic choices at a publicly known rate) – so while his article does an excellent job of showing examples of the problems with using off-the-shelf estimators in the presence of strategic interaction, the assumption of perfect information is not realistic (see also Lewis and Schultz 2003). The incomplete information model developed in this article better captures actual crisis dynamics.
DRAFT – DO NOT CITE 18
cases in which the independent variables genuinely have no relationship to immediate
deterrence success. These results support neither the “traditional prediction” of
deterrence theory nor the “selection effects prediction.”32
The addition of control variables to off-the-shelf quantitative estimators cannot
solve the selection effects problems. Controlling for the value of the other parameters in
the model (e.g., FC, FD, etc.) merely accounts for the effects of those parameters on IDS,
not for their effects on the shape of the relationship between K and IDS or p and IDS.
Adding control variables would assume that there is a single, "true" relationship between
the study variables and IDS and that each observation, once the effects of the control
variables are factored out, would contribute additional information about those true
relationships. Unfortunately that assumption is not warranted: there is no single function
that relates K or p to IDS. In effect, large datasets on crises almost certainly encompass
multiple causal relationships between K, p, and IDS, and consequently the observations
cannot simply be pooled.33 Even in a dataset with observations drawn randomly from all
values of K, p, and the other parameters, estimators that compute an "average"
relationship between the independent variables of interest and IDS, hoping to "wash out"
the effects of the various relationships between intervening variables like K and IDS,
would not yield meaningful results.
Scholars have also been tempted to try to mitigate the selection effects problem
by using sophisticated two-stage estimators proposed by Heckman and others.34 A recent
32 Note that the results in Figure 3 also contradict Fearon’s version of the selection effects prediction, which suggests a monotonic negative relationship between IDS and interest variables (K) and a monotonic positive relationship between IDS and power variables (p). 33 Collier and Mahoney 1996. 34 Huth and Allee 2002; Smith 1996; Nooruddin 2002.
DRAFT – DO NOT CITE 19
article on research design explicitly endorses this trend.35 These analyses separately
estimate relationships among variables during two stages of a crisis: they first study the
decision to initiate a crisis and then study behavior during a crisis conditional on the prior
decision to initiate a crisis in the first stage. The models are based on the hypothesis that
the error term in the first estimate is correlated with the error term of the second estimate.
Accounting for correlation in error terms is important, but it does not address the key
problem in research design described in this article: even if there were no correlation in
the error terms, we would still not know the right functional form to estimate at either
stage.36 The relationships at each stage are non-linear, and their shapes depend on all of
the model parameters. Simply using a selection model estimator does not help us to
determine whether to expect an increase in defender credibility will deter more potential
challengers at the first stage (crisis initiation) or at the second stage (crisis behavior). In
other words, putting aside the problems with the error terms, scholars do not know a
priori what coefficients to expect and what functional form to look for in their statistical
analyses of crisis behavior.
In sum, Fearon’s insight about selection effects made a substantial contribution to
scholars’ understanding of pitfalls in studies of deterrence. Unfortunately the hurdles that
stand before scholars are even higher than Fearon and others realized: both the traditional
and the selection effects predictions about the relationship between defender credibility
and immediate deterrence success should obtain in datasets of crisis outcomes.
35 Huth and Allee 2004. 36 In a similar vein, Smith 1999 also argues that at least two separate problems (censoring and interdependence of observations) plague datasets on crises. Accounting for correlation of error terms at best would solve one of the problems.
DRAFT – DO NOT CITE 20
Furthermore, simple solutions – such as using off-the-shelf selection estimators – do not
solve this problem.
Mitigating the Problems of Selection Effects with Case Studies
Given that selection effects make it difficult to make straightforward predictions
about crisis outcomes, what should scholars do to study coercion empirically? One
approach is to apply better statistical methods to datasets of crisis outcomes or to new
datasets tailored to account for the selection effects. We will address some of these
possibilities in the next section. A promising alternative is to study the process of
decision-making rather than the outcomes of crises. This section builds on the general
observation of Collier, Brady, and Seawright that studies of the causal process gain their
inferential leverage in a different way than studies of "dataset observations."37
Examining the decision-making process allows researchers to (1) directly observe
the variables that scholars typically measure indirectly (such as decision-makers’
estimates of their adversaries' power, interests, and credibility), and (2) directly observe
which tools of statecraft influenced those estimates. The key point is that most
quantitative studies of coercion use patterns of crisis outcomes to draw inferences about
how credible, powerful, or committed each country appeared to its adversary (and hence
about the relative effectiveness of the steps each country took to signal those attributes);
these inferences are dubious because of the non-monotonic relationships explained in the
previous section. By studying the decision-making process, however, scholars can avoid
reliance on inferences about what a given rate of IDS implies about a country’s
37 Collier, Brady, and Seawright 2004.
DRAFT – DO NOT CITE 21
credibility, power, or interests. Studying the decision-making process therefore avoids
the most serious problems posed by selection effects for studies of coercion.38
Many theories of coercion can be tested using evidence about the decision making
process. For example, scholars believe that leaders' assessments of their adversaries'
credibility have an important effect on coercion, so scholars would like to know what
influences decision-makers' assessments. Specifically, a credible defender is one that the
challenger believes is likely to fight rather than back down if confronted with a choice at
the fourth node of the game tree in Figure 1. Credibility then depends on model
parameters like K, p, and FD.39 Ideally scholars could study the effects of various tools of
statecraft on K, p, FD, and the other model parameters, and they could then learn about
both the causes of credibility and crisis dynamics.
In the empirical record, decision-makers do not speak in the language of the
model, but they frequently estimate their adversary's overall credibility and other
variables. Scholars can translate these estimates into the variables that are important for
the theories that they want to test, and they can also examine what evidence decision-
makers used as they made their assessments. Did they consider alliances, their
adversaries' past behavior, the balance of military capabilities, the personality traits of
specific leaders, or other factors? Scholars can read the internal memos and the
transcripts from closed-door meetings to "listen in" on the secret deliberations. Armed
with direct measures of credibility gleaned from examining the decision-making process, 38 Although decision-making processes can be studied using either quantitative or qualitative techniques, we highlight case study research designs because they have been overlooked as an approach to avoid selection effects. To be clear, scholars have frequently used case studies to study crisis decision-making and deterrence, but most of the past qualitative studies, like their quantitative counterparts, draw key conclusions from crisis outcomes and are therefore vulnerable to the problems introduced by selection effects. See, for example, George and Smoke 1974. 39 Credibility is the challenger's estimate of the probability that the defender will fight. We give a mathematical expression for this probability, labeled as y, in the appendix.
DRAFT – DO NOT CITE 22
scholars can learn a good deal of what they would like to know about coercion without
disentangling the complex relationships to crisis outcomes shown in the formal model.
In the following paragraphs we illustrate this research method. We introduce two
theories of credibility and provide a short background on two cases in which those
theories can be tested. We describe the ideal evidence that we might find in a case, and
we compare it to the actual data we gathered by studying the decision-making process.
Our goal here is not to conduct a conclusive test of theories of credibility; rather, the
point is about research design. A study using the type of data described below would
avoid the most serious problems associated with selection effects, and it is possible to
carry out such a study.40
Testing Theories of Credibility using the Appeasement Crises
A powerful conventional wisdom, which we call Past Actions theory, posits that
leaders assess their adversaries’ credibility by evaluating the adversaries' histories of
keeping or breaking commitments. A competing theory, Current Calculus theory, holds
that leaders pay little attention to their adversaries’ past behavior; instead, they consider
credible those threats that an adversary has the power to carry out at reasonable cost
compared to the value of the issue at stake.41
The series of crises in which Germany faced Britain and France before World
War II can be used to evaluate these theories. During the confrontations over
Czechoslovakia (1938) and Poland (1939), German leaders debated whether Britain and
France would follow through on their promises to defend Germany’s intended victims.
40 Case studies suffer from other problems, such as the difficulty of generalizing the results from a small number of cases. We discuss these limitations below. 41 For a more detailed description of these theories and a more complete effort to test them, see AUTHOR 2005.
DRAFT – DO NOT CITE 23
Because these crises were preceded by years of British and French vacillation, the Past
Actions theory predicts that German leaders would doubt the Allies’ threats.42 Minutes
from meetings, memos, and transcripts of secret deliberations should reveal German
leaders predicting another Allied withdrawal (for example, statements like: “The British
and French won’t defend Czechoslovakia.”). Furthermore, the documents should reveal
German leaders explaining their estimates of Allied credibility by referring to prior acts
of appeasement by the British and French (e.g., “The Allies won’t oppose us; they backed
down last time we faced them.”).
Current Calculus theory makes quite different predictions.43 Hitler thought that
German military power outmatched the Allies during both crises, but his military
commanders believed that Germany was outgunned until the crisis over Poland. Current
Calculus theory, therefore, predicts that Hitler would dismiss Allied threats during both
crises; the German military should view the Allies as credible in 1938 but less credible
the following year. And, according to the theory, the debates among German leaders
should have focused on the balance of capabilities rather than the Allies’ past actions
(e.g., “The French won’t fight us over Poland; the French Army is too weak.”).
The documents from Nazi Germany show that leaders’ private statements can be
used to track their assessments of enemy credibility. In meeting after meeting, Hitler
insisted that the Allies would back down if Germany attacked Czechoslovakia or Poland.
For example, in a key 1937 meeting he argued that despite Allied promises, their
intervention in a war over Czechoslovakia “was hardly probable;” he asserted that the
42 The British and French took no significant action when Germany repeatedly violated the Versailles Treaty by exceeding its permitted military size, reinstating conscription, militarizing the Rhineland region of Germany, and seizing Austria. 43 The interests at stake for the Allies and Germany were roughly equivalent in these two crises, so we focus the predictions of the Current Calculus theory on assessments of relative power. AUTHOR 2005.
DRAFT – DO NOT CITE 24
Allies had “written off the Czechs.”44 Hitler later assured his Foreign Minister that the
Allies “would definitely not move” to defend the Czechs.45 Hitler remained skeptical
about Allied credibility in the months leading to the Poland crisis; he repeatedly insisted
that the Allies would never fight for Poland.46
Germany’s military commanders disagreed with Hitler’s assessments during the
Czech crisis. Despite years of appeasement, they were confident that the Allies would
fight for Czechoslovakia. The German War Minister, the Army Commander in Chief,
and the Army Chief of Staff all argued vehemently against Hitler’s disparaging view of
Allied credibility.47 And when a large group of Germany’s senior military commanders
gathered in August, 1938, to discuss the military situation, they were nearly unanimous
that the Allies would defend Czechoslovakia if Germany attacked.48 The debates in
Germany show that it was not until 1939—when the balance of power finally swung in
Germany’s favor—that the military leadership finally shared Hitler’s low assessment of
British and French credibility.
The statements by German leaders permit a straightforward congruence test of
theories of credibility. Specifically, the Past Actions theory predicts that German leaders
should doubt the Allies’ promises to fight for Czechoslovakia and Poland; the theory
passes the congruence test that draws on Hitler's assessments of Allied credibility, but it
fails the congruence test based on the military's evaluation of the situation. The Current
Calculus theory performs better. It predicts that Hitler, who was confident in German
44 For the minutes of this meeting see Documents on German Foreign Policy [henceforth DGFP] 1949, 35. 45 See the collection of German documents edited by Michaelis, Schraepler and Scheel 1959, esp. p. 266 46 DGFP 1983, 552-55 and 200-206. 47 The arguments by the War Minister and Army Commander in Chief are recorded in DGFP 1949, 38. The memos that record the German Army Chief of Staff’s views of Allied credibility are reproduced in Müller 1980, esp. 502, 521-28. 48 Michaelis, Schraepler and Scheel 1979, 253-56.
DRAFT – DO NOT CITE 25
military power throughout the period, should have dismissed Allied threats in both crises
– and he did. German military commanders believed that Germany was outgunned until
the Poland crisis; as expected by the theory, they believed that the Allies would fight for
Czechoslovakia but not for Poland. The most important point from the standpoint of
research design is that these congruence tests are not undermined by the selection effect,
because the evaluation of the theories does not depend on the relationship between
estimates of credibility and the outcomes of the crises.
These congruence tests can be bolstered with the addition of causal-process
observations:49 evidence about the reasoning that Hitler and his military commanders
used as they debated Allied credibility. The best evidence would be statements that
directly explain both leaders' point estimates of the likelihood that their adversaries
would carry through on their threats and the reasoning that led them to their estimates.
Although decision-makers rarely speak with such clarity, their discussions frequently
reveal what they feel is salient as they consider and debate their options. In the German
case, Hitler and his generals repeatedly argued about Allied credibility, and their
discussions focused almost entirely on the balance of power. For example, in the months
leading to the Czechoslovakian crisis, Hitler explained why he believed the British and
French would not fight: the Empire was a drain on British resources, and their army was
too small to fight Germany; the French Army had obsolete weapons; the Italians would
be powerful German allies; and German fortifications could repulse an attack along the
Franco-German border.50 German commanders disagreed with Hitler’s assessment, but
they, too, focused on the balance of power – specifically, the traditional German problem
49 Collier, Brady, and Seawright 2004. 50 Michaelis, Schraepler and Scheel 1979, XX-XX.
DRAFT – DO NOT CITE 26
of fighting a two-front war. They argued that the Allies would fight because Germany
would be vulnerable while its army was busy fighting the Czechs.51 Hitler and his
Generals reached different conclusions about Allied credibility, but they reasoned
according to the balance of power and not the Allies’ past actions.
One well-known piece of evidence appears to support the Past Actions theory, but
the historical record reveals that it is an outlier. On August 22, 1939, Hitler derided
Allied threats to defend Poland: “Our enemies are worms. I saw them in Munich,”
referring to the Allies’ appeasement the previous year at the Munich Conference.52 But
Hitler actually presented seven arguments to explain why he doubted Allied credibility
during the August 22 meeting; all seven were about the military balance, and the
"worms" comment was simply a brief aside from his main train of thought.53
German deliberations before World War II show how congruence tests and
causal-process tests can be mutually reinforcing. Congruence tests are vulnerable to
spurious correlation, but evidence about causal processes can check that the congruence
is the result of the transmission mechanism suggested by the theory.54 Similarly, causal-
process observations suffer from leaders’ frequent failure to carefully articulate their
reasoning and the possibility that decision-makers may omit their real reasons from
discussions. But congruence tests mitigate these dangers by comparing the stated or
implied reasoning with the point estimates of the values of key variables. As we have
51 Documents 44 and 46 in Müller 1980; DGFP 1949, 38; Michaelis, Schraepler and Scheel 1979, 253-56. 52 DGFP 1983, 204. 53 Hitler argued that the British army was small; its Empire was crumbling; the French army was weak; German fortifications were powerful; supplies from Eastern Europe would circumvent a blockade; the German economy was strong; and the Soviet Union would ally with Germany (DGFP 1983, 554-555). 54 Process tracing can also help verify that the variables have been scored properly in the congruence test. See George and Bennett 2005.
DRAFT – DO NOT CITE 27
shown in the pre-World War II cases, these tests can be used together to draw robust
conclusions about decision-making processes and theories of coercion.
The key point for research design is that there is ample evidence in German
documents to assess theories of credibility. As it turns out, the evidence overwhelmingly
supports the predictions of Current Calculus theory.55 As with most empirical research,
the evidence from the case is not perfect, and not all of the evidence points in exactly the
same direction (e.g., the “worms” quote). But a case study of the 1930s crises that
directly measures credibility – rather than trying to draw inference about credibility by
observing crisis outcomes – avoids the pernicious impact of the selection effect, and a
clear preponderance of the evidence allows us to draw a conclusion about which theory
of credibility is more accurate.
Using the Cuban Missile Crisis
In 1962 the United States discovered Soviet nuclear-armed missiles in Cuba. As
U.S. leaders considered a range of military options to remove the missiles, a key question
they considered was the likely Soviet response: would Khrushchev back down as he had
repeatedly in recent crises over Berlin? Or would he stand firm and retaliate by striking
U.S. allies in Europe?
The missile crisis erupted soon after a series of Soviet bluffs over Berlin, so Past
Actions theory predicts that U.S. leaders would expect Khrushchev to back down again.
They should explain their views by referring to Khrushchev’s previous retreats. Current
Calculus theory, on the other hand, predicts that U.S. leaders would find the Soviet Union
55 It is striking that Hitler did not frequently argue that Allied appeasement revealed their unwillingness to fight. This would have been a smart rhetorical argument to support his preference for war even if he did not believe it. Yet almost all of his arguments assessed Allied credibility on the basis of the military balance.
DRAFT – DO NOT CITE 28
more credible than in the crises over Berlin. In previous crises, the U.S. could respond to
Soviet threats with the threat of escalation, because of the American capability to launch
a disarming nuclear first strike against the U.S.S.R. But U.S. nuclear superiority had
melted away by 1962, so Khrushchev could be more resolute in this crisis, refuse to
concede to American pressure, and retaliate in Europe if the United States used force. If
Current Calculus theory is correct, U.S. leaders should have expected Khrushchev to
stand firm regardless of American mobilization or threats to the island. The American
decision-makers should have explained their assessments by referring to the unfavorable
shift in the balance of power.
The statements of senior Kennedy Administration officials provide ample
evidence about Soviet credibility. Four years of Khrushchev’s bluffs over Berlin had no
effect on his—or the Soviet Union’s—credibility. Kennedy’s advisors were divided
about the best course of action during the crisis, but they were virtually unanimous that
Khrushchev would not back down. For example, the president predicted, “If we attack
Cuban missiles…it gives them a clear line to take Berlin.”56 Similarly, U.S. officials
were convinced that neither an ultimatum to Khrushchev to remove the missiles nor a
blockade of Cuba would cause the Soviets to back down. Even advocates of the
ultimatum strategy called the prospects of Khrushchev backing down “illusory” (142).57
And advocates of a blockade agreed that their preferred strategy was unlikely to work.
Secretary of Defense McNamara warned, “I never have thought we’d get [the missiles]
out of Cuba” through a blockade (417). The President favored a blockade but agreed
56 May and Zelikow 1997, 175. All subsequent quotes in this paragraph are from May and Zelikow, with page references in the text. 57 They believed that issuing an ultimatum would strengthen the U.S. diplomatic position in the crisis, which would be helpful if the crisis later escalated to war.
DRAFT – DO NOT CITE 29
with McNamara: “We’re not going to get [the missiles] out with the quarantine.” The
only way to do that was to “trade them out” or “go in and get them out ourselves” (464).
None of Kennedy’s senior advisors believed that Khrushchev would buckle to U.S.
pressure; Khrushchev’s credibility was high. The evidence from this congruence test is
consistent with the Current Calculus theory but not with the Past Actions theory.
The reasoning data from this crisis is less conclusive than the congruence test:
U.S. officials explained the logic behind their assessments less frequently than German
leaders in the 1930s did.58 What is striking, however, is that although the senior members
of the U.S. government worried incessantly about how backing down might affect U.S.
credibility—and how U.S. weakness over Cuba would harm future U.S. credibility over
Berlin—we found no instances during the deliberations in which U.S. leaders asked
themselves what previous Soviet withdrawals over Berlin revealed about Soviet
credibility. However, U.S. leaders did discuss the strategic nuclear stalemate and how it
narrowed their options toward Cuba.59 In sum, the evidence in this case study provides
moderate support for the Current Calculus theory—principally from the congruence
test—and flatly contradicts the Past Actions theory. The more important point is that
evidence to test theories of credibility through means that avoid the problem of selection
effects is abundant in the Cuban missile crisis case.
Case Studies, Coercion, and Theory Testing
Three clarifications about case studies are necessary. First, no matter what case
study researchers do, selection effects will still lurk in the background. Not all countries
58 Perhaps the U.S. officials made few efforts to explain the reasoning behind their views on Soviet credibility because there was broad agreement that Khrushchev was credible. In the 1930s crises, there were powerful disagreements, so people were forced to explain and advocate their views. 59 CITE.
DRAFT – DO NOT CITE 30
are equally likely to get into crises, and therefore the outcomes of the crises will not show
the effect of independent variables on deterrence or compellance success.60 But selection
effects should not significantly affect the process by which leaders assess their
adversaries' credibility, as long as highly motivated challengers and defenders do not
reason differently than less motivated ones.61
Second, case studies are not without their own substantial methodological
problems. Most notably, generalizing the results of a few cases to a larger population is
hazardous. This problem is real, but it need not be fatal. Scholars have offered
approaches to mitigate this weakness in case studies.62 No research design is flawless.
From the perspective of avoiding the selection effects explained in this article, though,
properly executed case studies have much to offer.
Finally, careful archival work requires substantial access to a country’s most
sensitive documents. Some countries make these documents available after a few
decades, but the sample of countries that do so is not representative. A few democracies
make their government documents available, and the conquest of Nazi Germany opened
the decisions of one dictatorship to scrutiny. But those cases may be idiosyncratic.
Nonetheless, the case study method mitigates the selection effects problem that
confounds quantitative analyses of coercion. Case studies can, data permitting, provide
high quality information about how leaders actually make decisions about deterrence.
60 Furthermore, leaders may be more likely to back down when they predict that audience costs will be low, further complicating efforts to interpret patterns in crisis outcomes (Schultz 2001; Sartori 2002). 61 If the leaders of aggressive countries reason differently than leaders of average countries, this method may reveal more about the aggressive leaders. Even so, learning how aggressive leaders evaluate credibility may be particularly valuable for understanding war initiation, evaluating theories of deterrence, and offering foreign policy prescriptions. 62 For two differing approaches, see Van Evera 1997 and [author] 2005.
DRAFT – DO NOT CITE 31
Where to Go From Here?
Formal theory, statistical analyses and case studies complement each other in
developing and testing theories of coercion. Studies using formal methods – including
Fearon's seminal work and continuing through the model presented in this article – have
helped disentangle the complex relationships among the key variables that determine
crisis dynamics. Neither statistical analyses nor case studies can draw reasonable
conclusions about crisis behavior without considering the effects of strategic interaction
on observable data – a task best accomplished formally.
Statistical analyses can also contribute greatly to the understanding of coercion,
but only if scholars account for selection effects. Two new lines of work in international
relations are promising in this regard. In one approach, scholars are beginning to use
custom statistical estimators that are derived directly from the payoffs and structure of the
crisis game they are modeling rather than using off-the-shelf estimators like probit and
logit. Because of the complexity of the relationships that scholars should expect to find
in datasets of crisis outcomes, they cannot adapt the off-the-shelf estimators (e.g., by
adding control variables or polynomial terms) to yield valid inferences. By using custom
estimators – which mathematically reflect the actual interactions of the key parameters in
a given game tree – scholars directly capture the strategic interaction in the model, hence
their analyses internalize the selection effect.63
Scholars who are working to implement this research design have recognized its
limitations.64 Most important, this method requires great faith in the precise structure of
the game tree. If the game does not conform to reality (that is, if it leaves out
63 Lewis and Schultz 2003. 64 We thank Jeff Lewis for helpful discussions about these points.
DRAFT – DO NOT CITE 32
substantively important crisis dynamics), then the estimated coefficients will not
necessarily be closer to reality than the flawed estimates from off-the-shelf statistical
techniques. And because the estimator itself assumes that the game tree really generated
the observed data, the estimating procedure cannot be used to test how well the game tree
matches reality. In effect, scholars can either assume that they have the right structure of
the model and then estimate the effects of explanatory variables of interest (e.g.,
alliances, speeches, and troop deployments), or scholars can assume that they know the
true effects of the explanatory variables and use the estimates to test the structure of the
game. They cannot do both at the same time. Furthermore, this approach to the
statistical analysis of coercion requires the creation of new datasets, and some of the new
outcome data in particular are conceptually difficult to code. Because the estimator gets
its inferential leverage from the percentage of interactions that end in different outcomes
– including the percentage of potential crises that end in the "status quo" outcome –
scholars need to be able to decide when a particular "status quo" period begins and ends,
even if the preceding and following periods are also "status quo" periods. Coding "too
many" or "too few" episodes of status quo is likely to substantially bias the results of the
estimator.
A second promising approach using large-n datasets is for scholars to study crisis
initiation rather than crisis outcomes.65 By studying the decision to initiate (or not) a
crisis, scholars avoid the most serious problem associated with selection effects: the non-
monotonic relationship between key parameters (such as defender interests and power)
and immediate deterrence success. Specifically, there is no reason to believe that
65 See, for example, Leeds 2003.
DRAFT – DO NOT CITE 33
increasing defender credibility, signals of defender interest, or defender power would
ever increase the probability that a challenger would threaten a protégé.66 Because the
relationships between the model parameters and the probability of a challenge are
monotonic, scholars can at least look for statistically significant correlates with the rate of
challenges.
But even this promising approach has its drawbacks. Selection effects still
complicate analyses that focus on crisis initiation. For example, studies that search for a
relationship between the presence (or type) of alliances and the likelihood of a military
challenge must account for the possibility that a country’s willingness to extend an
alliance is partly a function of the likelihood that the potential ally will be attacked.67
Furthermore, while this research design is a promising approach for answering some
important questions in international relations (e.g., when are threats issued?), it cannot
answer other critical questions: When do challenges lead to war? When do coercive
threats cause targets to concede? How much new information about their adversaries do
decision-makers learn during crises? To answer these questions scholars will need to
untangle the interactions between challengers and defenders at each stage of their
interaction – not just at the first decision in the game tree.68
66 All of our runs of the Maple code for the model described in this article have yielded monotonic relationships between the model parameters and the probability of challenges, supporting the underlying assumption of the approach adopted by Leeds and others. 67 If alliances are more likely when threats are low (because the expected cost of extending the alliance is small), then studies will find a negative correlation between the presence of an alliance and the likelihood of a challenge, even if alliances do nothing to deter challengers. On the other hand, it is also possible that likely targets of aggression will try harder to find allies (and will grant more concessions in exchange for an alliance) than countries that face smaller risks. Untangling these conflicting selection effects is necessary before the effect of alliances on the probability of challenges can be accurately estimated. 68 There is another problem with this approach: statistical estimators generally assume that a single relationship links each independent variable with the dependent variable. However, the shape of the relationship between key independent variables (e.g., defender interest and power) and the probability of a challenge varies widely: sometimes the relationship between the probability of a challenge and, say, K is nearly linear, and sometimes it curves sharply; sometimes it is relatively flat, and sometimes it is steep.
DRAFT – DO NOT CITE 34
Overall, neither of these two quantitative approaches is perfect, but each can be
used to strengthen scholars' understanding of coercion. "Straightforward" tests that seek
to draw inference from crisis outcomes, however, are dangerous.
Case studies can also contribute substantially to our understanding of decision-
making during foreign policy crises. Governments produce enormous paper trails when
they make decisions. Scholars should fully avail themselves of this evidence to directly
test how leaders reason as they consider whether to initiate a crisis, how they reason as
they debate the wisdom of responding to an adversary’s threats, and how they struggle
with the tough decisions they face during crises. Case studies – like formal methods and
statistical analyses – have their limitations, but they also have unique advantages: for
example, they allow scholars to directly measure variables (e.g., assessments of
credibility, assessments of power) that quantitative studies must measure indirectly by
observing behaviors that imperfectly reflect these variables. The field under-appreciates
the value of case studies in developing and testing theories of coercion, so scholars do not
aggressively exploit the archival data that documents actual episodes of crisis decision-
making. The presumption that scholars should draw inference from crisis outcomes also
infects the existing case study literature on crisis decision-making, and as long as it does,
case studies will suffer from selection effects as much as statistical studies. The key is to
use case studies (and statistical tests) in the appropriate way, considering the selection
effects revealed in the formal model of crisis interactions.
Graphs of these results are available from the authors on request. These varied relationships cause especially serious inferential problems if countries are more likely to use certain policy tools to coerce adversaries in given circumstances (e.g., perhaps they are more likely to move military forces when their interests are high, or they might be more likely to make speeches when their power is low). Inferences drawn from straightforward regression analysis in these circumstances will be biased.
DRAFT – DO NOT CITE 35
The broader point about research methods is that all the methods that scholars of
international politics have at their disposal are imperfect. Careful, systematic work—by
scholars using any of these research methods—can improve our understanding of
international relations. Each approach has its flaws, and each can offer something to
cover the blind spots of the others. Too often, calls for methodological diversity by
scholars of international relations are based solely on the virtues of broadmindedness and
collegiality. But there is an even stronger case to be made: progress in the field is more
likely if formal theorists, quantitative analysts, and case study researchers all use their
imperfect tools to poke at different sides of the same difficult problems. The real reason
to encourage methodological diversity in international relations is not that all beliefs,
styles, research methods are necessarily created equal; it is that given the complexity of
our subject matter and the limits on experimental research, we need careful formal
models, statistical analyses, and process tracing case studies to draw valid inferences.
DRAFT – DO NOT CITE 36
Appendix
The game tree in Figure 1 illustrates a deterrence crisis between a challenger and
defender. In this model the challenger and the defender each have three potential
strategies. The challenger can choose between (1) “not threaten”; (2) “threaten, not
attack”; and (3) “threaten, attack.” Similarly the defender has the same three choices
with respect to mobilization (or not) and fight (or not).
All the variables are assumed to be common knowledge except each country’s
value for the protégé, AC and AD, which are private information. AC and AD can take any
value, positive or negative.69
Challengers and defenders each estimate their adversary's interest level in the
protégé in terms of two variables. K is the challenger’s expected value for AD, and α is a
measure of the challenger’s uncertainty about AD such that the actual value of AD lies
along the continuum [K-α, K+α]. Similarly, J is the defender's expected value for AC,
and the challenger’s actual value for his interest in the protégé lies on the continuum [J-α,
J+α].70 J, K, and α are all common knowledge.71
69 Positive values express the extent to which a challenger would like to conquer the protégé and the extent to which the defender values the protégé's independence. A negative value for AC suggests that a challenger would prefer not to conquer a particular protégé. Many countries are unappealing targets of conquest – for example, because the cost of administration exceeds the advantages of ownership. Similarly, a negative value for AD suggests that a prospective defender prefers that the challenger conquer a protégé. Perhaps the protégé has significant value and the prospective challenger is an ally, or perhaps the protégé has negative value and the defender hopes the challenger will become stuck in a quagmire. 70 In this formulation, the challenger and defender have the same amount of uncertainty about the other’s value for the protégé. This simplification is probably not accurate; different countries may be easier or harder to “read.” For example, the interests of democracies may be easier to estimate (smaller α) because of the greater transparency of their political process; alternatively, the α of a country might shrink the longer a given leader remains in office, if a significant amount of uncertainty about “interests” is about particular views of specific leaders. Future research could allow α to vary between the challenger and defender to test how changes in α affect bargaining outcomes. 71 This model is based on Fearon 1992, but it differs in two important ways. First, we model the level of interest as the private information variable, while Fearon modeled the "value for war" as the private information variable. As discussed in section 2, the "value for war" is a problematic parameter because it
DRAFT – DO NOT CITE 37
To determine the relationship between the challenger's estimate of the defender's
interest in the protégé (K) and immediate deterrence success (IDS), we identify the
parameter values that induce challengers to choose “threaten, not attack."72 The range of
challengers that choose this strategy is bounded on the left by the level of AC at which the
challenger would be indifferent between threatening and accepting the status quo
(indicated by AC1 in Figure 2). The range is bounded on the right by the level of AC at
which the challenger would be indifferent between backing down after issuing a threat
and attacking the protégé (indicated by AC2 in Figure 2).73 For a given set of parameters,
the probability that a challenger will back down in a crisis – that is, the probability of IDS
– can be expressed as
!
C 2A " C1A
(J +#) " C1A
.
conflates power and interest; the “value for war” is a function of both the likelihood of winning (power) and the importance of the issue at stake (interest). Models that use a "value for war" parameter are therefore difficult to interpret. Furthermore, models designed to study immediate deterrence that use “value for war” as the private information parameter (specifically including Fearon 1992) face another problem. These models evaluate the probability of immediate deterrence by estimating the proportion of challengers who choose each of three possible strategies: “don’t threaten”, “threaten, not attack”, and “threaten, attack.” However the indifference point between the first two strategies is not a function of the challenger’s value for war. Fearon 1997 revises the formula for the payoffs for fighting a war along the lines we use here, but that article does not explore the implications of this change for crisis dynamics or for empirical studies of coercion. Second, instead of modeling the range of uncertainty in the private information variables' values as running from zero up to a maximum value for war, we add a separate parameter of the model to bound the range of uncertainty (α). This change is important, because in Fearon's original formulation there is a mechanical correlation between the unknown variable’s expected value and the level of uncertainty about its value. This raises a problem because (a) cases with high uncertainty are the ones in which there is room to bluff profitably; (b) in Fearon’s model these cases only arise when there is a high "value for war;" and (c) bluffing is unlikely to succeed when countries have a high value for war because adversaries are likely to choose the "attack" and "fight" branches of the tree. Fearon’s model, therefore, understates both the opportunity for bluffing and, consequently, the frequency of successful immediate deterrence. We can learn much more about bluffing behavior and immediate deterrence success by separating the level of interest in the protégé from the level of uncertainty in the model. Schultz 1999 uses a similar formulation to the one in our model in order to separate the level of uncertainty from the expected value for war, but that study still suffers from the problems of using a composite “value for war” parameter rather than a level of interest parameter. 72 “Threaten, not attack” appears to be a negative payoff strategy, but challengers sometimes choose it because they hope that the defender will not mobilize, giving the challenger the protégé for free. 73 For some values of J, K, α, p, RC, RD, FC, and FD, AC2 will be less than AC1. In those situations, immediate deterrence is never possible because any challenger who is willing to threaten is also willing to fight.
DRAFT – DO NOT CITE 38
AC1, the indifference point for the “threat/not threat” decision, is influenced by the
challenger's predictions of the likelihood that the defender will choose “mobilize” if the
challenger threatens and “fight” if the challenger attacks the protégé. AC2, the
indifference point for the challenger’s decision whether or not to attack, is influenced by
the additional information that the challenger has gleaned from the defender’s decision to
mobilize, which updated the challenger about the defender's level of interest in the
protégé.74 Mathematically, at the moment when the challenger decides whether or not to
threaten, it estimates the probability that the defender will mobilize as
!
q =(K +") # D1A
2".
At the challenger's second decision node, whether to attack or not given that the defender
has already mobilized, the challenger estimates the probability that the defender will fight
as
!
y =(K +") # D2A
(K +") # D1A.
AD2 can be readily expressed in terms of parameters whose values are public
knowledge. The defender's expected value for fighting is
!
[p(0) + (1" p)(" DA )]" DF , and
its payoff for backing down after mobilizing is just
!
" DA " DR . So the defender is
indifferent when
!
D2A =CF " CR
p.
The other three variables (AC1, AC2, and AD1) are more complicated, because each
depends on the other two. For example, AC1 is a function of whether or not the defender
will mobilize (the likelihood that AD is greater than AD1), but the defender's mobilization
decision partly depends on the defender’s estimate of the likelihood that the challenger
74 Defenders with low values for AD will cede the protégé without mobilizing. Those that mobilize either value the protégé highly and intend to fight, or they do not value the protégé sufficiently to fight but mobilize in the hope of exposing a challenger bluff (and thereby preserving the protégé and gaining an additional audience benefit by embarrassing the challenger).
DRAFT – DO NOT CITE 39
will attack,
!
x =( j +") # C 2A
( j +") # C1A. We can write three simultaneous equations for AC1, AC2,
and AD1 as follows:
(1)
!
" CR = y(1" p) C 2A " CF + (1" y)( C 2A + CR )
(2)
!
" D1A = x(" D1A " DR ) + (1" x) DR
" D1A = x[(1" p)(" D1A ) " DF ]+ (1" x) DR
# $ %
if
!
D1A < ( DF " DR ) p
D1A > ( DF " DR ) p
(3)
!
0 = q(" CR ) + (1" q) C1A
These nonlinear equations do not have a simple, analytical solution, but they can
be solved numerically. We used Maple 9 mathematical software to study the relationship
between K and IDS.75
75 The worksheets containing the Maple code are available on request.
DRAFT – DO NOT CITE 40
References
Bueno de Mesquita, Bruce, James D. Morrow, and Ethan R. Zorick. 1997. Capabilities,
Perceptions, and Escalation. American Political Science Review 91 (1): 15-27. Brady, Henry E., and David Collier, eds. 2004. Rethinking Social Inquiry: Diverse Tools,
Shared Standards. Lanham, MD: Rowman and Littlefield Publishers. Collier, David, and James Mahoney. 1996. Insights and Pitfalls: Selection Bias in
Qualitative Research. World Politics 49 (1): 56-91. Collier, David, Henry E. Brady, and Jason Seawright. 2004. Sources of Leverage in
Causal Inference: Toward an Alternative View of Methodology. In Rethinking Social Inquiry: Diverse Tools, Shared Standards, edited by Henry Brady and David Collier, 229-66. Lanham, MD: Rowman & Littlefield Publishers.
Collier, David, James Mahoney, and Jason Seawright. 2004. Claiming Too Much:
Warnings about Selection Bias. In Rethinking Social Inquiry: Diverse Tools, Shared Standards, edited by Henry Brady and David Collier, 85-102. Lanham, MD: Rowman & Littlefield Publishers.
Danilovic, Vesna. 2001a. Conceptual and Selection Bias Issues in Deterrence. Journal of
Conflict Resolution 45 (1): 97-125. Danilovic, Vesna. 2001b. The Sources of Threat Credibility in Extended Deterrence.
Journal of Conflict Resolution 45 (3): 341-69. Documents on German Foreign Policy, 1918-1945. 1949. Series D. Vol. 2. Washington,
D.C.: U.S. Government Printing Office. Documents on German Foreign Policy, 1918-1945. 1983. Series D. Vol. 7. Washington,
D.C.: U.S. Government Printing Office. Fearon, James D. 1992. Threats to Use Force: Costly Signals and Bargaining in
International Crises. Ph.D. diss., University of California, Berkeley. Fearon, James D. 1994. Signaling Versus the Balance of Power and Interests: An Empirical
Test of a Crisis Bargaining Model. Journal of Conflict Resolution 38 (X): 236-69. Fearon, James D. 1997. Signaling Foreign Policy Interests: Tying Hands versus Sinking
Costs. Journal of Conflict Resolution 41 (1): 68-90. Fearon, James D. 2002. Selection Effects and Deterrence. International Interactions 28 (1):
5-29.
DRAFT – DO NOT CITE 41
George, Alexander, and Andrew Bennett. 2005. Case Studies and Theory Development in the Social Sciences. Cambridge, Mass.: Massachusetts Institute of Technology.
George, Alexander, and Richard Smoke. 1974. Deterrence in American Foreign Policy:
Theory and Practice, New York: Columbia University Press. Huth, Paul K. 1999. Deterrence and International Conflict: Empirical Findings and
Theoretical Debates. Annual Review of Political Science 2 (X): 25-48. Huth, Paul K., and Todd L. Allee. 2002. Domestic Political Accountability and the
Escalation and Settlement of Interstate Disputes. Journal of Conflict Resolution 46 (6): 754-90.
Huth, Paul, and Todd Allee. 2004. Research Design in Testing Theories of International
Conflict. In Models, Numbers, and Cases: Methods for Studying International Relations, edited by Detlef F. Sprinz and Yael Wolinsky-Nahmias, 193-223. Ann Arbor, Mich.: University of Michigan Press.
Huth, Paul, Christopher Gelpi, and D. Scott Bennett. 1993. The Escalation of Great
Power Militarized Disputes: Testing Rational Deterrence Theory and Structural Realism. American Political Science Review 87 (3): XX-XX.
Huth, Paul, and Bruce Russett. 1984. What Makes Deterrence Work? Cases from 1900 to
1980. World Politics 36 (4): 496-526. Huth, Paul, and Bruce Russett. 1990. Testing Deterrence Theory: Rigor Makes a
Difference. World Politics 42 (4): 466-501. King, Gary, Robert O. Keohane, and Sidney Verba. 1994. Designing Social Inquiry.
Princeton: N.J.: Princeton University Press. Lebow, Richard N., and Janice G. Stein. 1989. I Think, Therefore I Deter. World Politics
41 (2): 208-24. Lebow, Richard N. and Janice G. Stein, 1990. Deterrence: The Elusive Dependent Variable.
World Politics 42 (3): 336-69. Leeds, Brett Ashley. 2003. Do Alliances Deter Aggression? The Influence of Military
Alliances on the Initiation of Militarized Interstate Disputes. American Journal of Political Science 47 (3): 427-39.
Lewis, Jeffrey B., and Kenneth A. Schultz. 2003. Revealing Preferences: Empirical
Estimation of a Crisis Bargaining Game with Incomplete Information. Political Analysis 11 (4): 345-67.
May, Ernest, and Philip Zelikow, eds. 1997. The Kennedy Tapes: Inside the White House
During the Cuban Missile Crisis. Cambridge, Mass.: Harvard University Press.
DRAFT – DO NOT CITE 42
Michaelis, Herbert, and Ernst Schraepler, eds., 1979. Ursachen und Folgen: Vom Deutschen Zusammenbruch 1918 und 1945 bis zur staatlichen Neuordnung Deutschlands in der Gegenwart [Causes and Consequences: From the German collapse in 1918 and 1945 to the reorganization of the present German state]. Vol. 12. Berlin: Dokumenten-Verlag Dr. Herbert Wendler.
Müller, Klaus-Jürgen. 1980. General Ludwig Beck: Studien und Dokumente zur politisch-
militurischen Vorstellungswelt und Tatigkeit des Generalstabschefs des deutschen Heer, 1933-1938 [General Ludwig Beck: Studies and documents on the political-military worldview and actions by the general staff of the German army, 1933-1938]. Boppard am Rhein: Harald Boldt Verlag.
Nooruddin, Irfan. 2002. Modeling Selection Bias in Studies of Sanctions Efficacy.
International Interactions 28 (1): 59-75. Partell, Peter J., and Glenn Palmer. 1999. Audience Costs and Interstate Crises: An
Empirical Assessment of Fearon's Model of Dispute Outcomes. International Studies Quarterly 43 (X): 389-405.
Sartori, Anne E. 2002. The Might of the Pen: A Reputational Theory of Communication
in International Disputes. International Organization 56 (1): 123-51. Schultz, Kenneth A. 1998. Domestic Opposition and Signaling in International Crises.
American Political Science Review 92 (4):: 829-44. Schultz, Kenneth A. 1999. Do Democratic Institutions Constrain or Inform? Contrasting
Two Institutional Perspectives on Democracy and War. International Organization 53 (2): 233-66.
Schultz, Kenneth A. 2001. Looking for Audience Costs. Journal of Conflict Resolution
45 (1): 32-60. Signorino, Curtis S. 1999. Strategic Interaction and the Statistical Analysis of
International Conflict. American Political Science Review 93 (2): 279-97. Signorino, Curtis S., and Kuzey Yilmaz. 2003. Strategic Misspecification in Regression
Models. American Journal of Political Science 47 (3): 551-66. Smith, Alistair. 1996. To Intervene or Not to Intervene: A Biased Decision. Journal of
Conflict Resolution 40 (1): 16-40. Smith, Alistair. 1999. Testing Theories of Strategic Choice: The Example of Crisis
Escalation. American Journal of Political Science 43 (4): 1254-1283.
DRAFT – DO NOT CITE 43
Sprinz, Detlef F., and Yael Wolinsky-Nahmias, eds. 2004. Models, Numbers, and Cases: Methods for Studying International Relations. Ann Arbor, Mich.: University of Michigan Press.
Van Evera, Stephen. 1997. Guide to Methods for Students of Political Science. Ithaca,
N.Y.: Cornell University Press.