24
Defen Externa DRDC-R Novemb Rest intell Welton Ch University David R. M DRDC - T Intelligenc https://doi Date of Pu ce Rese al Literature RDDC-2017 ber 2017 ructur ligence hang, Elissab y of Pennsylva Mandel Toronto Resea ce and Nation .org/10.1080/ ublication from earch an e (P) 7-P113 ring str e beth Berdini & ania arch Centre nal Security, 2 /02684527.20 m Ext Publish CAN UNC nd Deve CAN U ructure & Philip E. Tet 2017 017.1400230 her: Novembe CLASSIFIED lopment UNCLASSIFIE ed ana lock er 2017 D t Canad ED alytic te a echniq ques in n

CAN UNCLASSIFIEDD - cradpdf.drdc-rddc.gc.cacradpdf.drdc-rddc.gc.ca/PDFS/unc289/p805913_A1b.pdf · (WMD) programs. 6 The techniques are also taught in DIA’s Professional Analyst

  • Upload
    lykien

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CAN UNCLASSIFIEDD - cradpdf.drdc-rddc.gc.cacradpdf.drdc-rddc.gc.ca/PDFS/unc289/p805913_A1b.pdf · (WMD) programs. 6 The techniques are also taught in DIA’s Professional Analyst

DefenExternaDRDC-RNovemb

Restintell

Welton ChUniversity

David R. MDRDC - T

Intelligenchttps://doi

Date of Pu

ce Reseal LiteratureRDDC-2017ber 2017

ructurligence

hang, Elissaby of Pennsylva

Mandel Toronto Resea

ce and Nation.org/10.1080/

ublication from

earch ane (P) 7-P113

ring stre

beth Berdini &ania

arch Centre

nal Security, 2/02684527.20

m Ext Publish

CAN UNC

nd Deve

CAN U

ructure

& Philip E. Tet

2017 017.1400230

her: Novembe

CLASSIFIED

lopment

UNCLASSIFIE

ed ana

lock

er 2017

D

t Canad

ED

alytic te

a

echniqques inn

Page 2: CAN UNCLASSIFIEDD - cradpdf.drdc-rddc.gc.cacradpdf.drdc-rddc.gc.ca/PDFS/unc289/p805913_A1b.pdf · (WMD) programs. 6 The techniques are also taught in DIA’s Professional Analyst

CAN UNCLASSIFIED

Template in use: (2012) CR EL1 Advanced Template_EN 2017-11_02-V01_WW.dotm

© Her Majesty the Queen in Right of Canada (Department of National Defence), 2017 © Sa Majesté la Reine en droit du Canada (Ministère de la Défense nationale), 2017

CAN UNCLASSIFIED

IMPORTANT INFORMATIVE STATEMENTS

Disclaimer: This document is not published by the Editorial Office of Defence Research and Development Canada, an agency of the Department of National Defence of Canada, but is to be catalogued in the Canadian Defence Information System (CANDIS), the national repository for Defence S&T documents. Her Majesty the Queen in Right of Canada (Department of National Defence) makes no representations or warranties, expressed or implied, of any kind whatsoever, and assumes no liability for the accuracy,reliability, completeness, currency or usefulness of any information, product, process or material included in this document. Nothingin this document should be interpreted as an endorsement for the specific use of any tool, technique or process examined in it. Any reliance on, or use of, any information, product, process or material included in this document is at the sole risk of the person so using it or relying on it. Canada does not assume any liability in respect of any damages or losses arising out of or in connectionwith the use of, or reliance on, any information, product, process or material included in this document.

This document was reviewed for Controlled Goods by Defence Research and Development Canada (DRDC) using the Schedule to the Defence Production Act.

Endorsement statement: This publication has been peer-reviewed and published by the Editorial Office of Defence Research and Development Canada, an agency of the Department of National Defence of Canada. Inquiries can be sent to: [email protected].

Page 3: CAN UNCLASSIFIEDD - cradpdf.drdc-rddc.gc.cacradpdf.drdc-rddc.gc.ca/PDFS/unc289/p805913_A1b.pdf · (WMD) programs. 6 The techniques are also taught in DIA’s Professional Analyst

Restructuring structured analytic techniques in intelligence

Welton Chang , Elissabeth Berdini, David R. Mandel and Philip E. Tetlock

ABSTRACT

Structured analytic techniques (SATs) are intended to improve intelligence analysis by checking the two canonical sources of error: systematic biases and random noise. Although both goals are achievable, no one knows how close the current generation of SATs comes to achieving either of them. We identify two root problems: (1) SATs treat bipolar biases as unipolar. As a result, we lack metrics for gauging possible over-shooting—and have no way of knowing when SATs that focus on suppressing one bias (e.g., over-confidence) are triggering the opposing bias (e.g., under-confidence); (2) SATs tacitly assume that problem decomposition (e.g., breaking reasoning into rows and columns of matrices corresponding to hypotheses and evidence) is a sound means of reducing noise in assessments. But no one has ever actually tested whether decomposition is adding or subtracting noise from the analytic process—and there are good reasons for suspecting that decomposition will, on balance, degrade the reliability of analytic judgment. The central shortcoming is that SATs have not been subject to sustained scientific of the sort that could reveal when they are helping or harming the cause of delivering accurate assessments of the world to the policy community.

Introduction

Structured analytic techniques (SATs) are ‘mechanism[s] by which internal thought processes are exter-nalized in a systematic and transparent manner so that they can be shared, built on, and easily critiqued by others.’1 The aim of SATs is to improve the quality of intelligence analysis by mitigating commonly observed cognitive biases in analysts.2

SATs are a central feature of U.S. intelligence analysis training programs. The earliest SATs date to the 1970s, when Richards Heuer introduced the idea to CIA intelligence officer, Jack Davis. Heuer and Davis began developing the first of these methods, which they called ‘alternative analysis.’3 This term was tweaked to ‘structured analytic techniques’ and referred to as such in CIA’s analyst training program in 2005.4 Post-9/11 Congressional reforms known as the Intelligence Reform and Terrorism Prevention Act of 2004 (IRTPA) affirmed their doctrinal position within the IC.5 IRTPA mandated the use of SATs as part of a broader IC re-vamp in response to 9/11 and the misjudgment of Iraq’s weapons of mass destruction (WMD) programs.6 The techniques are also taught in DIA’s Professional Analyst Career Education (PACE) program and other IC training courses (focusing on techniques used by individuals or small teams as opposed to prediction markets, crowdsourcing, and war-gaming).

Warren Fishbein and Greg Treverton, the former chair of the National Intelligence Council, described the purpose of SATs as follows:

Page 4: CAN UNCLASSIFIEDD - cradpdf.drdc-rddc.gc.cacradpdf.drdc-rddc.gc.ca/PDFS/unc289/p805913_A1b.pdf · (WMD) programs. 6 The techniques are also taught in DIA’s Professional Analyst

… to help analysts and policy-makers stretch their thinking through structured techniques that challenge under-lying assumptions and broaden the range of possible outcomes considered. Properly applied, it serves as a hedge against the natural tendencies of analysts—like all human beings—to perceive information selectively through the lens of preconceptions, to search too narrowly for facts that would confirm rather than discredit existing hypotheses, and to be unduly influenced by premature consensus within analytic groups close at hand.7

At their core, SATs are a set of processes for externalizing, organizing, and evaluating analytic thinking. SATs range from the simple (e.g., structured brainstorming) to the complex (e.g., scenario generation); from pure text-based methods (e.g., key assumptions check) to visual ones (e.g., argument mapping). The total set of SATs presents a cornucopia of cognitive enhancers from which analysts can choose the one best suited for the problem at hand. The CIA Tradecraft Primer identifies 12 SATs, grouped into three categories, defined by its chief analytic purpose: to foster diagnostic, contrarian, or imaginative thinking.8 Diagnostic techniques are ‘aimed at making analytic arguments, assumptions, or intelligence gaps more transparent.’9 They are typically employed to evaluate hypotheses, to assess how existing evidence supports or refutes hypotheses, and to help analysts size up new evidence in light of existing information.10 The contrarian or ‘challenge-analysis’ techniques are used to probe existing analyses and conclusions, particularly assessments of the stability of the status quo, as well as to test the robustness of the consensus view. These SATs share ‘the goal of challenging an established mental model or ana-lytic consensus in order to broaden the range of possible explanations or estimates that are seriously considered.’11 Finally, Imaginative Thinking techniques are aimed at ‘developing new insights, different perspectives and/or developing alternative outcomes’.12

In the following sections, we delve into the claimed benefits of SATs over unaided analysis. We then describe, in detail, the flaws in the conception and design of SATs which render the claimed benefits implausible. Finally, we offer recommendations for improving the development, testing, and evaluation of SATs by better integrating scientific findings. As some intelligence scholars have recognized, the burden of proof should fall on those who urge the intelligence community (IC) to devote resources to SATs—or indeed any other cognitive-enhancement intervention.13

Why SATs are used in intelligence

SATs are used in intelligence analysis to help analysts: (a) produce more objective and credible judg-ments by mitigating cognitive biases14; (b) cope with information overload; and (c) make their thought processes more rigorous, consistent, and transparent, both to themselves and to policy makers, serv-ing as an accountability mechanism.15 SATs are supposed to chart a feasible middle ground between methods that are rigorous but technically daunting (e.g., Bayesian networks) and unbridled intuition. The objective of SATs is beyond reproach. Below we sketch three claimed categories of benefits.

Claim 1: SATs improve judgment quality through debiasing

Proponents claim that SATs ‘reduce the frequency and severity of error’ of intelligence assessments and estimates.16 Debiased reasoning can, under the right circumstances, produce more accurate judg-ments.17 Some base this claim in SATs’ face validity. For instance, Heuer argues,

if a structured analytic technique is specifically designed to mitigate or avoid one of the proven problems in human thought processes, and if the technique appears to be successful in doing so, that technique can be said to have face validity [emphasis added].18

Table 1 lists the 12 SATs in the CIA Tradecraft Primer and their putative targeted bias.19 Status quo bias and confirmation bias are the two most common biases that SATs are intended to mitigate.

Status quo bias, targeted by 10 of the 12 core SATs, occurs when analysts overweight the ‘no-change’ hypothesis in their assessments.20 In theory, SATs can check status quo bias by encouraging analysts to consider the plausibility of abrupt-surprise scenarios and, by implication, the fragility of the current equilibrium of forces.

Page 5: CAN UNCLASSIFIEDD - cradpdf.drdc-rddc.gc.cacradpdf.drdc-rddc.gc.ca/PDFS/unc289/p805913_A1b.pdf · (WMD) programs. 6 The techniques are also taught in DIA’s Professional Analyst

Other SATs are geared to check confirmation bias, which can compromise analysts’ work by restricting attention to a favored hypothesis, interpreting evidence in ways that bolster existing beliefs, and unfa-vorably viewing evidence that undercuts beliefs.21 In an ethnographic study of analysts, Rob Johnston noted that IC culture pushes analysts to conform to the community consensus.22 This finding was not limited to junior analysts; quite the opposite. Experience and expertise were positively correlated with favoring the consensus.23 According to James Bruce, the IC’s inaccurate assessment of Iraq’s WMD efforts might well have been different if analysts had used the right SATs: ‘what little observable evidence there was of [Iraq’s nuclear reconstitution]… was not only over interpreted but also was not assessed relative to any available evidence to the contrary.’24 Bruce conjectures that had analysts used ACH, they would have been required to consider alternative hypotheses and more closely examine disconfirming evidence, which could at least have lowered confidence in the favored hypothesis.25 Karl Spielmann similarly suggests that testing many hypotheses can help analysts overcome prevailing mindsets by ensuring relevant information is not overlooked, such as cultural differences between our culture and those of our adversaries.26 For example, Devil’s Advocacy attempts to mitigate confirmation bias by encouraging divergent perspectives.

Claim 2: SATs organize complex evidence

SATs are also supposed to help analysts: (a) organize information; (b) identify relevant and diagnostic reporting, pulling useful signals from background noise. Heuer and Pherson state that SATs ‘break down

Table 1. SATs target cognitive biases.

Technique Category Description of technique Targeted cognitive bias or limitationKey assumptions check Diagnostic List and review assumptions

on which fundamental judg-ments rest

Status quo bias; attribution error; wishful thinking

Quality of information check Diagnostic Evaluate reliability, complete-ness and soundness of availa-ble information sources

Status quo bias; confirmation bias; selective exposure; inadequate search

Indicators or signposts of change

Diagnostic Periodically review observa-ble trends to track events, monitor targets, and warn of change

Anchoring and under-adjustment

Analysis of competing hypoth-eses (ACH)

Diagnostic Identify alternative explana-tions and evaluate evidence bearing on hypotheses

Status quo bias; confirmation bias; attribu-tion error; selective exposure; congru-ence bias; anchoring and under-adjust-ment

Devil’s advocacy Contrarian Challenge consensus by building strong cases for alternatives

Confirmation bias, Status quo bias

Team A/Team B Contrarian Use of separate analytic teams that contrast two (or more) views

Confirmation bias, Status quo bias

High-impact/Low-probability analysis

Contrarian Highlight unlikely events with potential policy impact

Status quo bias

“What if?” Analysis Contrarian Assume a high-impact event has occurred and explain why

Confirmation bias, Status quo bias

Brainstorming Imaginative Use an uninhibited group pro-cess to generate new ideas

Status quo bias

Outside-in thinking Imaginative Identify the full range of basic forces and trends that could shape an issue

Status quo bias; errors in syllogism, illogical arguments

Red team analysis Imaginative Try to replicate how an adver-sary would think about an issue

Confirmation bias; attribution error; mirror-imaging

Alternative futures analysis Imaginative Explore multiple ways a highly uncertain situation can develop

Status quo bias

Page 6: CAN UNCLASSIFIEDD - cradpdf.drdc-rddc.gc.cacradpdf.drdc-rddc.gc.ca/PDFS/unc289/p805913_A1b.pdf · (WMD) programs. 6 The techniques are also taught in DIA’s Professional Analyst

a specific analytic problem into its component parts and [specify] a step-by-step process for handling these parts.’27 Treverton further argues that SATs are especially suited for solving intelligence issues that deal with transnational actors (e.g., terrorist groups), which offer limited historical-context clues, are not easily detected and monitored, and are often said to act unpredictably.28

Claim 3: SATs instill rigor and make analysts’ thought processes transparent, thus

more logically sound

SAT proponents believe that by systematically structuring analysis, the IC is simultaneously improving the accuracy and objectivity of analysis while acknowledging that analysts must be accountable to high standards of reasoning.29 SATs are intended to push analysts to think thoroughly and rigorously by decomposing problems and externalizing their thought processes so that reasoning mistakes can be corrected.30 Heuer also states that exposing one’s arguments to scrutiny encourages transparency, which ‘helps ensure that differences of opinion among analysts are heard and seriously considered early in the analytic process.’31

SATs thus make it easier to see how others arrived at different judgments. If analysts have conflicting views, they can retrace the steps in their thought processes to see where and why they diverged in their assessments. This can stimulate discussions of different interpretations of evidence or of varying views of the veracity of sources. And it can bring to light information that some had overlooked, properly pooling previously private information. Importantly, transparency enables other analysts to identify errant thinking, whether it stems from insufficient search or faulty interpretations of evidence found. Transparency might also boost policymakers’ valuations of IC assessments. Policymakers bear respon-sibility for outcomes, so their decisions ultimately must be justifiable to the public.32

Two core reasons why SATs are unlikely to deliver on promised benefits: bias

bipolarity and noise neglect

Unfortunately, little is known about whether structured analytic techniques improve, have no effect on or even degrade analysis because there is scant scientific research on their effectiveness. We propose that the root problems with SATs derive from the failure to deploy best practices in coping with the two fundamental sources of error that bedevil all efforts to improve human judgment: systematic bias and random noise in the processes by which people generate, test and update their hunches about the world. Specifically, SATs fail to address (a) the inherently bipolar nature of cognitive biases and the omnipresent risk that well-intentioned attempts to reduce one bias, say, over-confidence, will amplify its opposing bias, under-confidence; (b) the cumulative nature of error in multi-stage assessments, in which the noise in the conclusions from the one stage gets fed into the next—and the risk that well-intentioned efforts to reduce noise by decomposing complex judgments into sequences of sim-pler ones will make those judgments less, not more, consistent—an oversight we call noise neglect. Both problems stem from the lack of sustained efforts to subject SATs to scientific tests of efficacy and scrutinizing the processes for logical validity.33 The net result is that it is difficult to know when SATs are sparing us from serious mistakes, when they are creating more problems than they are solving, or when they are just ineffective.

Figure 1 illustrates the core concepts of bias bipolarity and noise neglect that guide our critique. The first row of Figure 1 depicts variation in the noisiness of judgments, which is a function of how far our independent efforts to hit the same target fall from each other. Suppose analysts are all trying to estimate when the latest North Korean missile will be launched—and that the analysts are working from identical data. The predictions displayed in Target B are manifestly noisier—which is to say, inferior to—those displayed in Target A. All else equal, a rational IC would prefer analysts making the forecasts depicted in Target A. The noisier the judgments, the greater the risk of being wrong, dramatically wrong—and that risk is not offset by the occasional lucky random hit. It follows that the IC should

Page 7: CAN UNCLASSIFIEDD - cradpdf.drdc-rddc.gc.cacradpdf.drdc-rddc.gc.ca/PDFS/unc289/p805913_A1b.pdf · (WMD) programs. 6 The techniques are also taught in DIA’s Professional Analyst

want SATs that, all else equal, bring down noise (i.e., random variation in analytic judgments of exactly the same data).

The second row depicts variation in bias, which is a function of how systematically our independent efforts to hit the same target deviate from the bull’s eye. The Target C judgments display a directional bias and the Target D forecasts displays an even more pronounced one. The amount of noise in the Targets C and D judgments is identical—and also identical to the amount of noise in Target A. If the IC worked in a world in which biases were overwhelmingly unidirectional, falling in one quadrant, it would not have to worry about the risks raised by SATs concentrating on reducing over-confidence, status quo bias or excessive deference to the consensus. But there are good reasons for doubting we live in such a world—and for worrying that warning only about over-confidence will eventually cause under-confidence, that pushing single-mindedly against the status quo bias will eventually trigger over-predictions of change and that straying far from consensus thinking will eventually cause us to waste time on frivolous contrarian arguments. All too often, SATs focus on only one pole of the bipolar bias continuum.34

The third row depicts what can go wrong when SATs fail to push hard enough against the prevailing bias (Target E) or push too hard against it and trigger a mirror-image bias (Target F). SATs based on a unipolar model of bias are at continual risk of producing a bipolar backlash effect. Getting closer to the bull’s eye requires SATs that avoid the mistakes of both pushing too little and pushing too hard. Helping analysts pull off this balancing act requires them to the bipolar nature of the bias, giving them feedback on their current susceptibility to both types of error, and encouraging them to experiment with evidence-weighting strategies that bring them closer to the bull’s eye, without over-shooting the mark.

The fourth row focuses on what can go wrong when SATs fail to recognize that problem decompo-sition is a two-edged sword that can amplify or reduce unreliability, depending on how procedures are interpreted and executed. Target G shows the result of a successful effort to bring down measurement

Figure 1. Effects of noise neglect and bias bipolarity on judgmental accuracy. Source: Adapted from Figure “How Noise and Bias Affect Accuracy” in Kahneman, Rosenfield, Gandhi, and Blaser, “Noise: How to Overcome the High, Hidden Cost of Inconsistent Decision-Making.”

Page 8: CAN UNCLASSIFIEDD - cradpdf.drdc-rddc.gc.cacradpdf.drdc-rddc.gc.ca/PDFS/unc289/p805913_A1b.pdf · (WMD) programs. 6 The techniques are also taught in DIA’s Professional Analyst

error (relative to Target A) and Target H shows the result of a badly botched effort to reduce unreliability (again, relative to Target A).

In our view, SATs rest on a deep ontological misconception about the nature of bias and a deep methodological misconception about the nature of measurement error and how best to reduce it.

Recently, Chang and Tetlock identified examples of lopsided analytic training that played up one side of the error equation—the need to be wary of overconfidence and of underweighting the poten-tial for change—and downplayed the other side—the risks of under-confidence and of exaggerating the prospects for change.35 Cognitive biases can cut in either direction, and SATs that focus exclusively on one pole of a bipolar bias render analysts vulnerable to the mirror-image bias, a vulnerability that becomes most obvious when the environment shifts. For example, people tend to be under-confident on easier tasks and over-confident when faced with harder ones.36 Similarly, depending on the environ-ment, people’s tendency to neglect base rates can cause them either to over-weight or under-weight the status quo.”37

Improving judgment via SATs requires balancing opposing biases. Debiasing is akin to growing a houseplant: under-water and watch it die; over-water and get the same result. ‘Add-water-and-watch-it-grow’ is not a viable horticultural strategy, just as ‘add-SATs-and-watch-analysis-flourish’ is not a via-ble cognitive psychological strategy. This is because the biases that SATs are designed to prevent are associated with opposing biases that can come into play when SATs overcorrect for the original errors. The problem is that SATs, as treatments of cognitive bias, tend to be applied with the assumption that both the degree and direction of bias afflicting analysts is known when in fact this is rarely known. The worst-case scenario arises when the SAT is so effective that it causes the analyst to exhibit the opposite bias. Unidirectional debiasing techniques are brittle because the degree and direction of bias among analysts is often moderated by situational factors, such as time pressure, and individual differences, such as the extent to which analysts naturally seek disconfirming information and their level of exper-tise on a subject.38

Robust debiasing strategies should aim to reduce deviations from accuracy by mitigating both the bias and the opposing one. This concept is distinct from the idea of debiasing by reducing the mag-nitude of one pole of a bias. As an example, take calibration feedback, which targets both over- and under-confidence, and contrast that with techniques that warn against the perils of overconfidence—trying to move people in a particular direction just the right amount requires knowing which direction and by how much.39

Thus, SATs apply a one-size-fits-all-biases approach to judgmental correction. But analysts vary in the amount of bias they exhibit based on the specifics of the situation, including variables such as evi-dentiary quality, target behavior and changeability, and even the past accuracy of judgments. Indeed, analysts could bias their judgments so that they wind up not ‘making the last mistake’– potentially what occurred in over-connecting the dots in the Iraq WMD case as a response to under-connecting the dots in the case of 9/11.40 Similarly, target behavior that does not change often can result in analysts overly favoring the status quo, which appeared to be the case in the IC’s missed calls on Russia’s invasion of Crimea and the Arab Spring uprisings. Analysts also vary in their ability and motivation to self-identify instances of biased cognition. In the worst case, analysts may mistakenly believe they are perfectly unbiased or that they are exhibiting one bias when they are actually exhibiting the opposing one. SATs depend on analysts and their managers correctly identifying the direction and magnitude of their own biases, probably a rarely satisfied precondition. Many arguments in favor of SATs come after major intel-ligence failures and which SATs to use in future analysis is thus informed by wanting to prevent another ‘PLA crossing the Yalu’ or ‘Soviet missile gap’ misjudgment. However, such hindsight-tainted inquiries may actually lead to the adoption of techniques that are a net negative for improving judgments on a future issue. For example, analysts concerned with how much weight they are giving to the status quo might adopt a Devil’s Advocacy approach which, in turn, puts them at risk of over-weighting engag-ing-but-low-probability scenarios. Brainstorming and scenario generation might have made 9/11-style attacks more imaginable, but by attending to too many far-fetched scenarios, the net effect might well

Page 9: CAN UNCLASSIFIEDD - cradpdf.drdc-rddc.gc.cacradpdf.drdc-rddc.gc.ca/PDFS/unc289/p805913_A1b.pdf · (WMD) programs. 6 The techniques are also taught in DIA’s Professional Analyst

have been to obscure more plausible attack avenues, such as homegrown extremists conducting mass killings with assault weapons.

Without proper measurement, it is impossible to gauge the direction and magnitude of biases that limit accuracy. Thus, there is great uncertainty over which biases to prioritize for correction—and how far correction should go before it results in overcorrection. Table 2 shows how opposing biases can arise from one-sided use of SATs.

Many SATs are designed to challenge the consensus view. But focusing exclusively on ratcheting down excessive conformity can have the unintended effect of inducing excessive second-guessing. Indeed, a recent study of Canadian strategic intelligence analysts found just that—geopolitical forecasts were systematically under-confident rather than over-confident.41 Nor was such evidence relegated to a particular experience level of analysts: junior and senior analysts alike displayed significant under-con-fidence in forecasting. Experimental research corroborates these findings. In one study, military intel-ligence analysts who judged the probabilities of two mutually exclusive and exhaustive hypotheses assigned too little certainty to the hypotheses.42 Since SATs are designed to challenge prevailing lines of reasoning, they could inadvertently undercut the confidence of already under-confident analysts. By couching judgments in unnecessary uncertainty, intelligence agencies water down the indicative value of intelligence. Under-confidence can foster, not reduce, uncertainty in the minds of decision makers.

Table 2 also reveals that SATs heavily focus on preventing status quo bias, which can activate the opposing bias of base-rate neglect. Daniel Kahneman and Amos Tversky coined the term to describe ‘sit-uations in which a base rate that is known to a subject, at least approximately, is ignored or significantly underweighted.’43 When analysts are pushed to focus on the possibility of change, they unsurprisingly over-predict change.44 For instance, ‘What If?’ Analysis and Brainstorming encourage deviant ideas that can steer analysts away from diagnostic information. Underweighting of cross-case base-rates is typically accompanied by the overweighting of case-specific indicators, causing miscalibration of confidence estimates.45 Specifically, people unreasonably privilege observable, case-specific evidence, and are less sensitive to the predictive validity of each piece of evidence given the prior odds of each outcome occurring (as prescribed by a Bayesian approach).46 The root problem is that most people have difficulty assessing probabilities without proper training.47 Studies have shown that when evidence is extreme relative to a base-rate, individuals become ‘over-confident’ by Bayesian standards, whereas ‘weak’ evidence yields ‘under-confident’ judgments.48 Further, individuals tend to be over-confident with a small sample size (i.e., minimal collection of indicative evidence) and under-confident with a large sample size (i.e., great breadth of indicative evidence), which demonstrates an inability to draw

Table 2. SATs can potentially trigger opposing biases.

Technique Targeted bias Activated opposing biasKey assumptions check Status quo bias; fundamental attribution

error; wishful thinkingFundamental situational error

Quality of information check Status quo bias; confirmation bias; selec-tive exposure; inadequate search

Under-confidence; excessive search

Indicators or signposts of change Anchoring and under-adjustment Over-confidence; over-adjustment; excessive volatility

Analysis of competing hypotheses (ACH) Status quo bias; confirmation bias; attribu-tion error; selective exposure; congru-ence bias; anchoring and under-adjust-ment

Incoherent probabilities

Devil’s advocacy Status quo bias Base-rate neglectTeam A/Team B Status quo bias Groupthink, Group PolarizationHigh-impact/Low-probability analysis Status quo bias Base-rate neglect“What if?” Analysis Status quo bias Base-rate neglectBrainstorming Status quo bias Incoherent probabilities Outside-in thinking Status quo; illogical arguments Ignorance fallacy; overconfidenceRed team analysis Confirmation bias; attribution error;

mirror-imagingGroupthink; overconfidence;

“otherness”: believing others to be fundamentally irrational

Alternative futures analysis Status quo bias Incoherent probabilities

Page 10: CAN UNCLASSIFIEDD - cradpdf.drdc-rddc.gc.cacradpdf.drdc-rddc.gc.ca/PDFS/unc289/p805913_A1b.pdf · (WMD) programs. 6 The techniques are also taught in DIA’s Professional Analyst

inferences proportional to the strength of available evidence.49 People likely do not properly weigh new evidence in conjunction with their prior beliefs, causing irrational confidence in judgments. Overall, SATs that target status quo bias could amplify the inflation or deflation of associated probabilities, an unintended consequence that makes the improbable seem more probable than warranted.

Neglecting noise: Do SATs reduce noise in judgments?

In addition to slighting bias bipolarity, we believe that certain SATs unnecessarily introduce noise into an already noisy process, causing inconsistent judgments to become even more inconsistent (in scien-tific parlance, unreliable). Reliability of assessments is a necessary but not sufficient condition of their validity. Many intelligence analysts are highly skilled, attaining expertise through years of experience and specialized training. If nothing has changed, we should expect an expert analyst examining the same evidence using the same SATs to reach the same judgments at different times—that is, to be both internally consistent and temporally stable.50 However, several types of SATs rely on decomposing and separately addressing the components of a problem (e.g., evidence, hypotheses, conclusions). Unfortunately there is a lot of subjectivity in the decomposition process, in part, due to the ambiguity of the SAT-prescribed processes. This sets the stage for an assumed strength of SATs—proceduralizing thinking processes as an alternative to ‘mere intuition’—to impair intelligence analysis.

Ultimately, SATs rely on subjective analytic inputs (which extends even to interpreting relatively objective scientific-signature and remote-sensing data). Without well-defined rules that reliably improve interpretation of inputs and sharpen analytic outputs, SATs may serve only as a vehicle transporting subjectivity from one end of the process to the other. The SAT process becomes an end in itself, dress-ing up subjective judgments in a cloak of objectivity. The inconsistent handling of the decomposed parts—hypotheses, evidence, and hypothesis-evidence linkages—is especially worrisome.

SATs allow inconsistent handling of raw intelligence, particularly judgments of how evidence bears on alternative hypotheses. For example, in exercises involving over 3,000 students from multiple agencies, Heuer and Pherson found that ‘students disagree on how to rate the evidence in about 20–30 percent of the cells in [an ACH] matrix.’51 This is unsurprising given that the ACH technique does not provide detailed guidance on how ‘consistency’ should be defined, despite the centrality of the concept of ‘consistency’ in all ACH rating exercises.

Analysts may also actively expand the original conceptual framework of a favored hypothesis to accommodate new evidence (whereas confirmation bias shoehorns evidence into the hypothesis). Thus, it is not the most accurate hypothesis that emerges as the most likely from an SAT framework, but the hypothesis that is most ideologically consistent with an analysts’ preconceived notions—a problem known as ‘conceptual stretching’ or ‘elastic redefinition.’52 Analysts may not recognize their favoritism toward a hypothesis, which can affect how leniently or harshly they judge the consistency of evidence for favored or disfavored hypotheses.53

SATs also have difficulty handling interdependencies among strands of evidence. Techniques such as ACH and Indicators and Signposts of Change untangle such messes by instructing analysts to break problems into their components. This reductionist treatment ignores the first-order interactions between variables and the feedback loops and subsequent acceleration or deceleration of interactions between variables. The outbreak of World War I emerged from a complex intertwining of relationships, which is difficult to decompose without distortion.54 If interdependencies are not accurately captured in simulations, a small variation can cause drastic perturbations.55 More importantly, the SATs rely on the analyst to properly decompose the problem, offering little guidance on how far to take the reductionist exercise and leading again to the institutionally-sanctioned-subjectivity problem. Beebe and Beebe note that although SATs can help analysts evaluate binary relationships, they do not address the feedback loops and chained interactions involving many variables, including elements outside those captured in the existing analytic space.56 The difficulty in representing the interactions between variables is what makes complex situations so difficult to understand and predict.57

Page 11: CAN UNCLASSIFIEDD - cradpdf.drdc-rddc.gc.cacradpdf.drdc-rddc.gc.ca/PDFS/unc289/p805913_A1b.pdf · (WMD) programs. 6 The techniques are also taught in DIA’s Professional Analyst

How evidence is weighed (determining the relative importance of sources) is another key aspect of analysis largely neglected by structured methods. Without more explicit rules, analysts are left to their own subjective devices. ACH is one of the few methods that do capture the value of different evidence via its ‘weighted inconsistency score.’ However, ACH needs to provide more explicit guidelines for weight-ing evidence consistently. Different analysts assigning different weights (i.e., inter-rater disagreement) can lead to incoherent and inaccurate weight assignments.58 ACH’s description for ranking evidence by diagnosticity is subject to interpretation and can thus lead to inconsistent rankings across individuals (and even within individuals depending on when they looked at the evidence (the phenomenon of latent self-contradiction)).

Noise neglect can also lead to biased reasoning. Methods such as ACH do not check on whether analysts have a shared interpretation of ‘consistency’ and ‘inconsistency’ of evidence with hypotheses—leaving the meaning of consistency subjective. The instructions for ACH (e.g., the doctrinaire version in the CIA’s primer) are underspecified, and as currently designed, will often lead to inter-subjective disagreement over what evidence ‘fits’ or ‘doesn’t fit’ with a given hypothesis. The CIA’s primer on ACH lists ‘attacks on journalists’ as being inconsistent with the Japanese cult Aum Shinrikyo being a ‘kooky cult’ and neutral with relation to it being a ‘terrorist organization,’ but reasonable analysts could view attacking the media as a common terrorist tactic.59 If analysts are thinking this way about evidence within ACH, then ACH is merely a structured approach to facilitating the analyst’s use of the repre-sentativeness heuristic—in effect, ACH requires the analyst to answer the question, ‘To what extent does this evidence seem to fit or match this hypothesis?’60 The representativeness heuristic has been implicated in several judgment biases, including the conjunction fallacy, insensitivity to sample size, and overestimating the probative value of individuating information.61

SAT trainers and critics alike point to the need to dispel the illusion of analytic soundness afforded by SATs. If an analyst goes through the technique’s motions without significant challenge to their exter-nalized thinking, SATs may provide only a veneer of analytic legitimacy without true improvements to analytic quality. As currently designed, SATs enable imprecision (i.e., unsystematic deviations from accuracy) which can degrade the reliability of judgments – even though the declared goal is to force analysts to generate more precise ones. In the absence of new information, the same analyst using the same SAT should generate the same judgment over time (be reliable). Reliability is a necessary but not sufficient factor for establishing validity—analysts should at least not contradict their own prior judgments of exactly the same data.62 Despite these flaws, the intelligence literature tends to assume that SATs work largely as intended. SAT advocates (and even official documents) often take an uncritical stance on their net utility.63 These flaws remain unquestioned because SATs have only rarely been subjected to scientific scrutiny. We therefore see a need for testing the net effectiveness of SATs as well as eliminating known flaws.

SATs remain mostly untested

The neglect of bias bipolarity and noise has persisted so long because SATs have never been subjected to sustained scientific validation,64 a fact that even the most ardent SAT advocates acknowledge.65 The lack of testing has other downstream impacts, including mistrust in the techniques by rank-and-file analysts. Analysts under time pressure are understandably reluctant to put their faith in time-consuming techniques of unknown value.66

To our knowledge, no official report has closely examined whether structuring analysis improves rea-soning as measured against the standards in ICD 203, such as proper sourcing of claims, exhibiting clear and logical argumentation, and being more accurate in assessments.67 In 2007, the IC began formally evaluating intelligence products based on the metrics of objectivity, political independence, breadth of information, timeliness, and proper tradecraft.68 The IC has only recently begun to systematically evaluate the accuracy of its estimates and to connect their accuracy to the tradecraft used to produce them.69 Heuer and Pherson themselves concede, ‘the conventional criterion for validating an analytic

Page 12: CAN UNCLASSIFIEDD - cradpdf.drdc-rddc.gc.cacradpdf.drdc-rddc.gc.ca/PDFS/unc289/p805913_A1b.pdf · (WMD) programs. 6 The techniques are also taught in DIA’s Professional Analyst

technique is the accuracy of the answers it provides.’70 They acknowledge that there is currently ’no systematic program for evaluating or validating the effectiveness of these techniques.’71

In an independent review, Coulthart examined the efficacy of SATs based on their impact on rigor and accuracy. Brainstorming proved effective in improving analysis in 40% of cases; however, face-to-face collaborative Brainstorming (the form of Brainstorming endorsed by the CIA Tradecraft Primer) had a consistently detrimental effect on the quality of judgments.72 Devil’s Advocacy outperformed analyses derived from consensus methods of decision-making in 70% of cases.73 In a separate study, ACH reduced confirmation bias but only for people without an intelligence background. Those with an intelligence background showed no reduction in confirmation bias.74 While Devil’s Advocacy fared well, these preliminary results suggest that SATs as a whole warrant greater skeptical examination.

Consider also Rob Johnston’s reflection on his experience with another form of competitive analysis Red Team Analysis: ‘[the demographics of the group were] not representative of the adversary we were intended to simulate,’ underscoring how methods emanating from the best of intentions can never-theless go awry.75 According to Johnston, there was only one person in the group who had a cultural background related to the target; that person’s contributions were severely undervalued by other group members, who instead favored theories consistent with their own backgrounds.76 This example demonstrates how mirror-imaging bias still pervades Red Team analysis, contrary to its design intention. In such cases, SATs produced a false sense of opinion diversity.

Although some have objected to evaluating the effectiveness of SATs empirically, Heuer and Pherson state that the concerns ‘could be largely resolved if experiments were conducted with intelligence analysts using techniques as they are used within the Intelligence Community to analyze typical intel-ligence issues.’77 There have been promising efforts to satisfy each requirement but they have yet to be integrated into a comprehensive evaluative framework such as the ICD 203 standards.78 Without a clear performance standard, analysts might only seek to employ SATs because they are formalized within ICD 203, not because they are efficacious.

Addressing bias bipolarity and noise neglect to improve SATs

SATs should promote analytic assessments that are reliable (i.e., consistent within analyst across time), valid (i.e., based on sound reasoning), and accurate (i.e., corresponding to truth). Achieving this goal starts by fleshing out sub-processes within techniques (how each SAT categorizes evidence, whether the process is collaborative, the situations in which that SAT should be used, etc.). This includes ensuring that SATs start from as strong a theoretical framework as possible. It is also necessary to better integrate logical and probabilistic reasoning to make SATs internally coherent. Most importantly, SATs should be evaluated on whether they lead to accurate judgments and higher quality explanations, so their potential to aid judgment can be better understood.

Establish more explicit rules for handling evidence

We propose establishing explicit rules to weight and categorize evidence to promote consistency in the application of SATs and, more importantly, in the assessments they support. We suggest keeping records to identify the role various types of intelligence information played in assessing previous sit-uations.79 The relationships between evidence (e.g., raw reporting, open source media, background information) and outcomes would help develop base-rates of event occurrence that, over time, could provide ‘outside view’ checks on portfolios of ‘inside view’ case-based assessments.80 Such information can be used to more properly weight evidence within techniques such as ACH; meanwhile, analysts’ source track records can be used to inform assessments of source credibility when conducting Quality of Information Checks.

Clarifying SAT rules for evaluating evidence can help analysts think critically about how they should interpret new or unfamiliar information, rather than relying on their intuition.81 Implementing such rules would decrease ambiguity in how new observations are interpreted–which would, in turn, lead to

Page 13: CAN UNCLASSIFIEDD - cradpdf.drdc-rddc.gc.cacradpdf.drdc-rddc.gc.ca/PDFS/unc289/p805913_A1b.pdf · (WMD) programs. 6 The techniques are also taught in DIA’s Professional Analyst

analysts and analytic groups dealing with evidence more consistently. For example, explicit categoriza-tion rules would clarify objectively what constitutes a ‘moderate concern’ signpost versus a ‘substantial concern’ signpost within the Indicators and Signposts of Change process. How much should analysts update their beliefs about a country possessing WMD if they observe attempts to acquire aluminum tubes that may or may not be suitable for enrichment? These clarifications should minimize the potential to interpret evidence to ‘fit’ pet theories.

Incorporate probability theory into SATs

SATs should include mechanisms to ensure analysts reason probabilistically and can express their prob-abilistic assessment precisely. Numeric probabilities are preferred but it may sometimes be possible to use verbal probabilities consistently and carefully to achieve adequate precision. The aim of probabilistic assessments is not to misrepresent analysts’ judgments as scientific facts, but rather to promote con-ditions that support clear verification of internally consistent thinking and accuracy over the long run. Such verification, in turn, would dramatically improve intelligence accountability by supplementing a virtually exclusive focus on analytic process with important indicators of accuracy.82 By evaluating a conclusion such as North Korea is ‘likely’ to collapse by the end of the year, analysts can calibrate their future judgments accordingly.

With precise feedback, analysts would be less likely to over-estimate the likelihood of rare outcomes and more likely to make calibrated assessments that properly identify the relative strengths of the arrayed hypotheses.83 For improving inter-subjective consistency and agreement, assigning numeric probabilities to hypotheses would enable analysts to communicate more granular and meaningful estimates to one another and to policymakers.84 Alan Barnes, when he was director of the Middle East and Africa Division of the Intelligence Assessment Secretariat in the Government of Canada, introduced a nine-point probability scale and found it created a ‘common understanding of the degree of certainty attached to a judgment,’ thereby reducing problems of misinterpretation that arise when analysts com-municate their judgments to others.85 Barnes also found that increased experience with using numeric probabilities made analysts more comfortable using them.86

Verify the efficacy of SATs for debiasing and noise reduction

We recommend that SATs be tested using the scientific gold standard: control-randomized experiments, designed in such a way that they are sensitive to the specific and unique circumstances that analysts face: time-pressure, evidential uncertainty, and content bearing on national security.87

Furthermore, Heuer has noted that two different analysts using the same SAT, evaluating the same analytic problem, can reach different judgments due to different ‘mental models.’88 However, propo-nents of SATs should agree that if an SAT is to produce consistent results, the same analyst using a given SAT should interpret a given situation the same way each time he encounters it, if in fact nothing has changed. In other words, beliefs should not randomly oscillate in the absence of new information. Thus, to measure SATs for reliability, we propose various tests such as examining test-retest reliability, susceptibility to framing effects, refocusing effects, and whether using methods result in violations of logical constraints on reasoning.89 For example, assessments of the probability of a terrorist attack in a specified location and timeframe can differ substantially from assessments of the probability of no terrorist attack in the same location and timeframe when subtracted from 1. If such assessments were coherent, they would be the same. However, it appears that querying people about the occurrence or non-occurrence of an event, such as a terrorist attack, often triggers different information search, memory retrieval, and assessment processes.90 Likewise, when the probability of success on a given type of instance is high (say 90% likely), people judge event descriptions like ‘exactly 1 success in 4 tries’ as more probable than ‘exactly 3 failures in 4 tries’, despite the fact that they refer to the same conjunction (i.e., ‘1 success and three failures out of 4 tries’) and are therefore equiprobable.91 SATs

Page 14: CAN UNCLASSIFIEDD - cradpdf.drdc-rddc.gc.cacradpdf.drdc-rddc.gc.ca/PDFS/unc289/p805913_A1b.pdf · (WMD) programs. 6 The techniques are also taught in DIA’s Professional Analyst

should help analysts avoid these predictable forms of logical inconsistency, but there is currently no credible evidence that they do.

Specifically, we suggest the following test-retest evaluation of temporal stability: the same analyst is given identical intelligence problems to analyze (the same evidence and objective) a month or two apart. If the technique is reliable, analysts will categorize the components the same way each time, and more importantly, arrive at the same judgment. This is an especially important test for diagnostic techniques. Requiring analysts to break information into components (e.g. evidence, indicators, sources) in the absence of well-specified criteria might lead to an unnecessarily messy process. For example, if an analyst is using Indicators or Signposts of Change and identifies a specific indicator (e.g., military discontent with a civilian government) as a ‘Moderate Concern’ in Trial 1, then—in the absence of new information—the analyst also should identify it as a ‘Moderate Concern’ in Trial 2. If we find that ana-lysts’ classifications of indicators in Trial 1 and Trial 2 are unreliable, the technique should be improved by creating more explicit rules for categorizing evidence, prior to validity testing. Test-retest reliability experiments have been conducted in professions such as radiology, tax accounting, and auditing.92 Our proposal would provide insight into the question of whether SATs reduce, increase, or leave unchanged the degree of judgmental imprecision.

Note that the proposed method would not test the value of analytic collaboration, but it could easily by adding another design element. We could devise similar experiments for groups to test collaboration techniques, like Brainstorming and Red Teaming. The group aspect of these SATs is intended to pro-mote a diverse range of opinions, enabling all hypotheses to be considered and the optimal judgment chosen. Thus, it may be advisable to give multiple groups the same analytic project and access to the same evidence, then measure consistency across group judgments rendered with or without the SAT. Does SAT use improve inter-group reliability? We won’t know until we test.

Establishing a feedback loop to continuously improve SATs

Intelligence organizations should periodically update and improve SATs in response to data from accu-racy assessments. Part of this feedback loop requires scoring assessments for accuracy, which can sometimes be done even when key judgments are expressed as verbal probabilities.93 The historical accuracy of intelligence forecasts is mostly unknown.94 By quantifying the likelihoods that analysts assign to hypotheses using a logically coherent process, we can record the frequency and degree to which each analytic structure produces accurate judgments (as measured by skill metrics such as Brier scores), to create reliable performance records. Then, we could continuously catalog all estimative judgments and conduct follow-up post-mortems to cross-check whether the hypothesis that corre-sponded to the correct outcome was ‘on the radar’ throughout, and examine why it was undervalued or dismissed.95 These records could be used to assess current threats by researching which similar past threats analysts assessed accurately, and which factors differentiated failures from success.96 Johnston refers to this type of system as an ‘institutional memory’—a library of ‘lessons learned’ that analysts can utilize and add to.97 A database eventually could develop into a system that allows for automatic extraction of information relevant to a specific person, date, location, facility, organization, etc.98 In the long run, this would save analysts time, because they could retrieve information more quickly from a centralized, coded system, and efficiently discover associations throughout the history of the topic.99

In addition, making feedback available to individual analysts can help them improve their judgments over time, including measures of calibration and discrimination.100 Most people don’t have an accurate sense of their strengths and weaknesses when assessing uncertainty; performance feedback has been shown to help individuals improve their performance, by keeping records to dispel self-serving illu-sions of skill.101 Johnston proposed a ‘Performance Improvement Infrastructure’ to measure individual analytic performance and the impact of different ‘interventions’ (e.g. SATs) on their performance.102 The performance of intelligence analysts could become a little more like weather forecasters, whose success Griffin and Tversky attributed in part to their getting ‘immediate frequentist feedback.’103 While intelligence analysis and weather forecasting are different activities (clouds don’t have agency, for one

Page 15: CAN UNCLASSIFIEDD - cradpdf.drdc-rddc.gc.cacradpdf.drdc-rddc.gc.ca/PDFS/unc289/p805913_A1b.pdf · (WMD) programs. 6 The techniques are also taught in DIA’s Professional Analyst

thing) and analysts will likely never approach the calibration of meteorologists (who have data rich computer models), feedback may lead to some judgmental improvements. How much is an empirical question that can only be answered after such a system is instituted.

Finally, we propose a new, more scientifically grounded taxonomy for organizing, testing, and gen-erating new SATs. One way of organizing SATs is to determine when best to use them by aligning them with analytic production phases (e.g., during initial drafting, during the community coordination process) and cognitive sub-processes (e.g., during broad information search, during the generation of hypotheses).104 A more parsimonious way is to align SATs with the deeper cognitive trade-offs that analysts routinely encounter and feature prominently in the psychological literature. For example, the current official taxonomy of diagnostic, contrarian, and imaginative techniques could be simplified to two categories: helping analysts engage in critical thinking and helping analysts engage in creative thinking. This distinction has been observed by psychologist Tom Gilovich and analogizes SATs to tests that ask ‘must-I-believe-it?’ and tests that ask ‘can-I-believe-it?’105 As analysts encounter new evidence, reconsider old evidence in light of new evidence, and generate and test hypotheses, they must balance between minimizing errors of believing things they are not logically obliged to believe (i.e., checks on excessive gullibility). They must also minimize errors of failing to give credence to possibilities they should have given weight (i.e., checks on excessive rigidity). Existing techniques such as foresight analy-sis help analysts on ‘can I believe it’ type questions and devil’s advocacy stress tests analytic conclusions of the ‘must I believe it’ variety.

Put another way, some SATs will enhance foveal vision (i.e., our capacity to rapidly and accurately diagnose threats and opportunities in front of us), which is especially helpful in current and crisis intelligence functions. Others will enhance peripheral vision (i.e., our capacity to anticipate threats that are outside plausibility range of conventional wisdom—thus blending contrarian and imaginative techniques), which is helpful for longer-term assessments and horizon scanning. Aligning SATs with scientifically grounded functions will make it easier to manage the tough cognitive trade-offs analysts regularly encounter.

Conclusion

The great twentieth century sociologist Robert Merton distinguished between the manifest and latent functions of collective practices and rituals. Rain dances are supposed to serve the manifest function of causing clouds to form and produce rain, but they also serve the latent functions of bringing the tribe together and fostering social cohesion. Merton argued that the latent functions of a surprisingly wide range of collective activities were actually much more important to people than the manifest or officially declared functions.

Pointing out the possible latent functions of a practice can be risky. Insiders often take umbrage when outsiders suggest that insiders’ public reasons for doing X are not their real reasons. But a serious scientific inquiry requires taking the risk and posing the impertinent question: does the IC value SATs more for their manifest or latent functions?

Thus far, we have taken the IC at its word and assumed that the manifest functions of SATs— improv-ing analysis—are its real reason for embracing SATs. But the central claims of this article raise ques-tions. If SATs have never been rigorously tested for efficacy, rest on an untenable unipolar conception of cognitive biases, and foster inconsistent judgments: why has this suboptimal state of affairs been allowed to persist for decades?

We see three broad sets of possibilities. First, we have under-estimated the IC and SAT proponents. The IC knows more about the effectiveness of SATs than we claim—and SATs are more effective than we have suppose. Second, the IC knows as little about efficacy as we claim—and it is not particularly interested in learning more. SATs serve valuable bureaucratic-political signaling functions. They send the message that the IC is committed to playing a pure epistemic game and trains its analysts in accord-ance with quasi-scientific ground rules. Success is measured not along an accuracy metric but along an impression management metric that follows from an intuitive politician mindset: are we creating the

Page 16: CAN UNCLASSIFIEDD - cradpdf.drdc-rddc.gc.cacradpdf.drdc-rddc.gc.ca/PDFS/unc289/p805913_A1b.pdf · (WMD) programs. 6 The techniques are also taught in DIA’s Professional Analyst

desired impression on key constituencies?106 Third, the IC knows about as much as we worry it does—and would like to learn more—but is skeptical that it is possible to move much beyond face validity.

We are unlikely to convince readers who fall in the first and second camps but we do seek to engage readers in the third. This is because SATs hold much promise. If constructed properly and continuously tested and refined, they probably can help analysts produce more accurate and better-reasoned reports. That SATs remain largely untested is a major problem for their adoption by IC analysts because of the need for demonstrated efficacy to be a part of the SAT ‘convincing case.’ Further training on which SATs are best suited for which issues (regional, functional) can only be developed with data from rigorous applied scientific research showing that a specific SAT is well-aligned for a problem sub-type. Sometimes the best SAT may be no-SAT-at-all.

In a nutshell, this paper is an appeal for greater scientific rigor. Science, like intelligence, can be messy. Progress is slow and only comes from efforts to push boundaries. SAT proponents and opponents alike should welcome the fresh attention. The problems with SATs are serious but there are potential fixes. Although there is no guarantee that SATs will prevent the next WMD misjudgment or predict the next major terrorist attack, they could still improve analysis in meaningful and measurable ways. Figuring out how best to design SATs so that they help, not hinder, intelligence analysts should be a top priority.

Disclaimer

The views presented are those of the authors and do not represent the views of the Intelligence Advanced Research Projects Activity, the Department of Defense or any of its components, or the United States Government; nor do the views presented represent the views of the Department of National Defence or any of its components or the Government of Canada.

Notes

1. Heuer and Pherson, Structured Analytic Techniques for Intelligence Analysis, 4.2. Heuer, Psychology of Intelligence Analysis, 95; Heuer and Pherson, Structured Analytic Techniques for Intelligence

Analysis, 4.3. Others, including Pherson, Heuer, and Mandeep Dhami and colleagues, have published more complex taxonomies,

that match SATs to the cognitive processes which analysts initiate while on task; Central Intelligence Agency, A Tradecraft Primer: Structured Analytic Techniques for Improving Intelligence Analysis, 5; Heuer and Pherson, Structured Analytic Techniques; Dhami, Belton, and Careless, “Critical Review of Analytic Techniques”; Heuer, “Taxonomy ofStructured Analytic Techniques.”

4. Coulthart, “Why do Analysts use Structured Analytic Techniques? An In-Depth Study of an American Intelligence Agency”; Heuer and Pherson, Structured Analytic Techniques.

5. Intelligence Reform and Terrorism Prevention Act of 2004; Marrin, “Training and Educating U.S. Intelligence Analysts.”6. Chang and Tetlock, “Rethinking the Training of Intelligence Analysts”; Heuer and Pherson, Structured Analytic

Techniques, 9; Coulthart, “Improving the Analysis of Foreign Affairs: Evaluating Structured Analytic Techniques,” 41; Coulthart, “Improving the Analysis of Foreign Affairs: Evaluating Structured Analytic Techniques,” 4.

7. Fishbein and Treverton, “Rethinking ‘Alternative Analysis’ to Address Transnational Threats,” 1.8. Central Intelligence Agency, A Tradecraft Primer: Structured Analytic Techniques for Improving Intelligence Analysis, 5.9. Ibid.10. Heuer and Pherson, Structured Analytic Techniques, 158.11. Ibid., 215.12. Central Intelligence Agency, A Tradecraft Primer: Structured Analytic Techniques for Improving Intelligence Analysis, 5.13. Cooper, Curing Analytic Pathologies: Pathways to Improved Intelligence Analysis; Mandel, Applied Behavioural Science

in Support of Intelligence: Experiences in Building a Canadian Capability; Marrin, Improving Intelligence Analysis:Bridging the Gap Between Scholarship and Practice; Heuer, “The Evolution of Structured Analytic Techniques,”

14. Heuer and Pherson, Structured Analytic Techniques, 5.15. Ibid., 4.16. Ibid., 5.17. Chang, Chen, Mellers and Tetlock, “Developing Expert Political Judgment: The Impact of Training and Practice on

Judgmental Accuracy in Geopolitical Forecasting Tournaments.”18. Heuer and Pherson, Structured Analytic Techniques, 309.19. Central Intelligence Agency, A Tradecraft Primer: Structured Analytic Techniques for Improving Intelligence Analysis.

Page 17: CAN UNCLASSIFIEDD - cradpdf.drdc-rddc.gc.cacradpdf.drdc-rddc.gc.ca/PDFS/unc289/p805913_A1b.pdf · (WMD) programs. 6 The techniques are also taught in DIA’s Professional Analyst

20. Chang and Tetlock, “Rethinking the Training of Intelligence Analysts”; Heuer, Psychology of Intelligence Analysis.21. Nickerson, “Confirmation Bias: A Ubiquitous Phenomenon in Many Guises,” 175.22. Johnston, Analytic Culture in the US Intelligence Community: An Ethnographic Study, 21.23. Harris and Spiker, Critical Thinking Skills for Intelligence Analysis, 217.24. George and Bruce, “Making Analysis More Reliable: Why Epistemology Matters to Intelligence.”25. Ibid.26. Spielmann, “I Got Algorithm: Can There Be a Nate Silver in Intelligence?.”27. Heuer and Pherson, Structured Analytic Techniques, 4.28. Central Intelligence Agency, Rethinking “Alternative Analysis” to Address Transnational Threats.29. Greenwood, “Two Dogmas of Neo-Empiricism: The ‘Theory-Informity’ of Observation and the Quine-Duhem Thesis.”30. Heuer and Pherson, Structured Analytic Techniques, xv, 9.31. Ibid., 4.32. Spielmann, “I Got Algorithm: Can There Be a Nate Silver in Intelligence?,” 526.33. We should address the false dichotomy between structured and intuitive reasoning. Advocates of SATs portrayed

them as an ‘alternative’ to ‘traditional’ (intuition-based) analysis. Advocates imply that without SATs, analysts are left with only their intuition to make sense of the world. It should be noted that unaided analysis is not the same as intuition. Unaided analysis can still be effortful, deliberate thinking. Analysts have a wide array of potentialstrategies, such as with deductive, inductive, or abductive forms of reasoning. SAT advocates are leaving a lot of reasoning improvement gains on the table that could be achieved by honing deliberate thinking.

34. There remains significant debate regarding the efficacy of a variety of debiasing methods, see the discussion in Croskerry P, Singhal G, Mamede S, Cognitive debiasing 1: origins of bias and theory of debiasing, BMJ Quality and Safety: 23 July 2013.

35. Chang and Tetlock, “Rethinking the Training of Intelligence Analysts.”36. Lichtenstein and Fischhoff, “Training for Calibration.”37. Philips and Edwards, “Conservatism in a Simple Probability Inference Task”; Kahneman and Tversky, “On the

Psychology of Prediction.”38. Ordonez and Benson, “Decisions under Time Pressure: How Time Constraint Affects Risky Decision Making”;

Haran, Ritov and Mellers, “The Role of Actively Open-Minded Thinking in Information Acquisition, Accuracy, and Calibration”; West, Toplak and Stanovich, “Heuristics and Biases as Measures of Critical Thinking: Associations with Cognitive Ability and Thinking Dispositions”; Wegener and Petty, “The Flexible Correction Model: The Role of Naïve Theories of Bias in Bias Correction”; Wegener, Dunn, and Tokusato, “The Flexible Correction Model: Phenomenology and the Use of Naïve Theories in Avoiding or Removing Bias.”

39. Moore and Healy, “The Trouble with Overconfidence,” 502.40. Mellers and Tetlock, “Intelligent Management of Intelligence Agencies: Beyond Accountability Ping-Pong.”41. Mandel and Barnes, “Accuracy of Forecasts in Strategic Intelligence.”42. Mandel, “Instruction in Information Structuring Improves Bayesian Judgment in Intelligence Analysts.”43. Kahneman and Tversky, “On the Reality of Cognitive Illusions.”44. Chang and Tetlock, “Rethinking the Training of Intelligence Analysts.”45. Bar-Hillel, “The Base-Rate Fallacy in Probability Judgments.”46. Griffin and Tversky, “The Weighing of Evidence and the Determinants of Confidence.”47. Lehner, Adelman, Cheikes, Brown, “Confirmation Bias in Complex Analyses.”48. Griffin and Tversky, “The Weighing of Evidence and the Determinants of Confidence.”49. Ibid.50. Kahneman, Rosenfield, Gandhi, and Blaser, “Noise: How to Overcome the High, Hidden Cost of Inconsistent

Decision-Making.”51. Heuer and Pherson, Structured Analytic Techniques, 311.52. Chang and Tetlock, “Rethinking the Training of Intelligence Analysts”; Collier and Mahon, “Conceptual ‘Stretching’

Revisited: Adapting Categories in Comparative Analysis”; Piercey, “Motivated Reasoning and Verbal vs. Numerical Probability Assessment: Evidence from an Accounting Context.”

53. Chang and Tetlock, “Rethinking the Training of Intelligence Analysts.”54. Jervis, System Effects: Complexity in Political and Social Life.55. Ibid.56. Beebe and Beebe, “Understanding the Non-Linear Event: A Framework for Complex Systems Analysis,” 511.57. Ibid.58. Spielmann, “I Got Algorithm: Can There Be a Nate Silver in Intelligence?,” 535.59. CIA, “A Tradecraft Primer: Structured Analytic Techniques for Improving Intelligence Analysis,” 15.60. Kahneman and Tversky, “On the Study of Statistical Intuitions.”61. Tversky and Kahneman, “Extensional versus Intuitive Reasoning: The Conjunction Fallacy in Probability Judgment.”62. Kahneman, Rosenfield, Gandhi, Blaser, “Noise: How to Overcome the High, Hidden Cost of Inconsistent Decision

Making.”

Page 18: CAN UNCLASSIFIEDD - cradpdf.drdc-rddc.gc.cacradpdf.drdc-rddc.gc.ca/PDFS/unc289/p805913_A1b.pdf · (WMD) programs. 6 The techniques are also taught in DIA’s Professional Analyst

63. According to the CIA Tradecraft Primer, “Although application of these techniques alone is no guarantee of analytic precision or accuracy of judgments, it does improve the sophistication and credibility of intelligence assessments,” 5;Khalsa, “The Intelligence Community Debate over Intuition versus Structured Technique: Implications for Improving Intelligence Analysis and Warning”; Walton, Challenges in Intelligence Analysis: Lessons from 1300 BCE to the Present.

64. Artner, Girven and Bruce, Assessing the Value of Structured Analytic Techniques, 4.65. Heuer, “The Evolution of Structured Analytic Techniques.”66. Coulthart, “Why Do Analysts Use Structured Analytic Techniques? An in-Depth Study of an American Intelligence

Agency”; Artner, Girven and Bruce, Assessing the Value of Structured Analytic Techniques, 3.67. Intelligence Community Directive 203, Analytic Standards.68. Friedman and Zeckhauser, “Why Assessing Estimative Accuracy is Feasible and Desirable,” 7.69. Ibid., 2.70. Heuer and Pherson, Structured Analytic Techniques, 312.71. Ibid., 309.72. Coulthart, “Why Do Analysts Use Structured Analytic Techniques? An in-Depth Study of an American Intelligence

Agency.”73. Ibid.74. Ibid.75. Johnston, Analytic Culture in the US Intelligence Community: An Ethnographic Study, 81.76. Ibid., 81–82.77. Heuer and Pherson, Structured Analytic Techniques, 312.78. Hammond, “How Convergence of Research Paradigms Can Improve Research on Diagnostic Judgment”; Marrin,

“Evaluating the Quality of Intelligence Analysis: By What (Mis) Measure?,” 896; Friedman and Zeckhauser, “WhyAssessing Estimative Accuracy is Feasible and Desirable,” 6.

79. Spielmann, “I Got Algorithm: Can There Be a Nate Silver in Intelligence?,” 535.80. Tikuisis and Mandel, “Is the World Deteriorating?.”81. Harris and Spiker, Critical Thinking Skills for Intelligence Analysis, 217.82. Tetlock and Mellers, “Intelligent Management of Intelligence Agencies: Beyond Accountability Ping-Pong”; Barnes,

“Making Intelligence Analysis More Intelligent: Using Numeric Probabilities,” 330.83. Chang and Tetlock, “Rethinking the Training of Intelligence Analysts.”84. Ibid.85. Barnes, “Making Intelligence Analysis More Intelligent: Using Numeric Probabilities,” 333.86. Ibid.87. We note that the IC-sponsored research project, Crowdsourcing Reasoning Evidence Analysis Thinking and

Evaluation (CREATE), which three of the authors have or are currently participating in, is in the process of doing exactly this.

88. Heuer and Pherson, Structured Analytic Techniques.89. Inspired by the noise audit idea contained in Kahneman, Rosenfield, Gandhi, and Blaser, “Noise: How to Overcome

the High, Hidden Cost of Inconsistent Decision-Making.”90. Mandel, “Are Risk Assessments of a Terrorist Attack Coherent?.”91. Mandel, “Violations of Coherence in Subjective Probability: A Representational and Assessment Processes Account.”92. Miller and Ireland, “Intuition in Strategic Decision Making: Friend or Foe in the Fast-Paced 21st Century?”; Hoffman,

Slovic, and Rorer, “An Analysis-of-Variance Model for the Assessment of Configural Cue Utilization in ClinicalJudgment”; Chang and McCarty, “Evidence on Judgment Involving the Determination of Substantial Authority: Tax Practitioners versus Students”; Meixner and Welker, “Judgment Consensus and Auditor Experience: An Examination of Organizational Relations.”

93. Ho et al., “Improving the Communication of Uncertainty in Climate Science and Intelligence Analysis”; Mandel,“Accuracy of Intelligence Forecasts From the Intelligence Consumer’s Perspective”; Mandel and Barnes, “Accuracy of forecasts in strategic intelligence.”

94. Spielmann, “I Got Algorithm: Can There Be a Nate Silver in Intelligence?”; Mandel and Barnes, “Accuracy of Forecasts in Strategic Intelligence”; National Research Council, Intelligence Analysis: Behavioral and Social Scientific Foundations. These evaluations would complement those recommended by ODNI’s James Marchio. Marchio, “How Good is Your Batting Average?” Early IC Efforts to Assess the Accuracy of Estimates.”

95. Friedman and Zeckhauser, “Why Assessing Estimative Accuracy is Feasible and Desirable,” 31; Marchio, “How Good is your Batting Average? Early IC Efforts to Evaluate the Accuracy of Estimates.”

96. Spielmann, “I Got Algorithm: Can There Be a Nate Silver in Intelligence?,” 527; Mandel, Barnes, and Richards, A Quantitative Assessment of the Quality of Strategic Intelligence Forecasts.

97. Johnston, Analytic Culture in the US Intelligence Community: An Ethnographic Study, 112.98. Harris and Spiker, Critical Thinking Skills for Intelligence Analysis, 218.99. Ibid., 219.100. Tetlock and Gardner, Superforecasting: The Art and Science of Prediction; Rieber, “Intelligence Analysis and

Judgmental Calibration.”

Page 19: CAN UNCLASSIFIEDD - cradpdf.drdc-rddc.gc.cacradpdf.drdc-rddc.gc.ca/PDFS/unc289/p805913_A1b.pdf · (WMD) programs. 6 The techniques are also taught in DIA’s Professional Analyst

101. Friedman and Zeckhauser, “Why Assessing Estimative Accuracy is Feasible and Desirable,” 27.102. Johnston, Analytic Culture in the US Intelligence Community: An Ethnographic Study, 108.103. Griffin and Tversky, “The Weighing of Evidence and the Determinants of Confidence.”104. Dhami, Belton, and Careless, “Critical Review of Analytic Techniques.”105. Gilovich, How We Know What Isn’t So: The Fallibility of Human Reason in Everyday Life.106. Mandel and Tetlock, “Debunking the Myth of Value-Neutral Virginity: Toward Truth in Scientific Advertising.”

Acknowledgements

The authors thank several anonymous Canadian and U.S. Intelligence Community reviewers, Steve Rieber and Jeff Friedman for helpful comments. We also thank Jesus Chavez for research assistance. This work contributes to the NATO System Analysis and Studies Panel Research Task Group on Assessment and Communication of Uncertainty in Intelligence to Support Decision Making (SAS-114). An earlier draft of this paper received the 2017 Bobby R. Inman Award for student scholarship in intelligence studies.

Disclosure statement

No potential conflict of interest was reported by the authors.

Funding

This work was supported by the Intelligence Advanced Research Projects Activity, Joint Intelligence Collection and Analytic Capability Project #05ad and Canadian Safety and Security Program project #2016-TI-2224.

Notes on contributors

Welton Chang, a PhD candidate at the University of Pennsylvania, is a psychologist who studies reasoning, judgment and decision-making. From 2005–2014, he served as a U.S. Army intelligence officer and as an intelligence collection analyst at the Defense Intelligence Agency. His overseas experience includes 21 months in Iraq and a year in South Korea. He sits on the boards of the John Sloan Dickey Center for International Understanding and the American Resilience Project and is a Truman National Security Fellow. Welton earned a B.A. in Government, cum laude with departmental high honors, from Dartmouth College, an M.A. in Security Studies from Georgetown University, and an M.A. in Psychology from the University of Pennsylvania.

Elissabeth Berdini is a JD candidate at the Northwestern Pritzker School of Law. Berdini earned a B.A., cum laude with distinction in Philosophy, Politics, and Economics (PPE) from the University of Pennsylvania. She received the PPE Award for Distinguished Research for her senior honors thesis, Cognitive Triage: A Strategy for Better Decision Making with Multiple Tasks at Hand.

David R. Mandel is a senior Defence Scientist with Defence Research and Development Canada and Adjunct Professor of Psychology at York University. He previously held academic positions in psychology at University of Victoria and University of Hertfordshire. He has published widely in peer-reviewed journals on the topics of thinking, reasoning, judgment, and decision-making and has co-edited The Psychology of Counterfactual Reasoning, Neuroscience of Decision Making, and Improving Bayesian Reasoning: What Works and Why? Mandel is the Chairman of the NATO System Analysis and Studies Panel Research Technical Group on Assessment and Communication of Uncertainty in Intelligence to Support Decision Making (SAS-114).

Philip E. Tetlock, is the Annenberg University Professor at the University of Pennsylvania, with cross appointments in psychology, the Wharton school, and political science. He previously held the Mitchell Endowed Professorship at the University of California, Berkeley. Tetlock has published widely and is the author of Superforecasting: The Art and Science of Prediction and of Expert Political Judgment: How Good Is It? How Can We Know? among other books, and has published widely in peer-reviewed journals on the topics of expert judgment, judgmental biases, and methods of debiasing. He has received scientific awards and honors from a wide range of scholarly organizations, including the National Academy of Sciences, the American Academy of Arts and Sciences, the American Political Science Association, and the American Psychological Association.

Page 20: CAN UNCLASSIFIEDD - cradpdf.drdc-rddc.gc.cacradpdf.drdc-rddc.gc.ca/PDFS/unc289/p805913_A1b.pdf · (WMD) programs. 6 The techniques are also taught in DIA’s Professional Analyst

References

Artner, Stephen, Girven Richard S., and Bruce James B. Assessing the Value of Structured Analytic Techniques in the U.S. Intelligence Community. Santa Monica, CA: Rand Corporation, 2016.

Bar-Hillel, Maya. “The Base-Rate Fallacy in Probability Judgments.” Acta Psychologica 44, no. 3 (1980): 211–233.Barnes, Alan. “Making Intelligence Analysis More Intelligent: Using Numeric Probabilities.” Intelligence and National Security

31, no. 3 (2016): 327–344.Beebe, Sarah M., and George S. Beebe. “Understanding the Non-Linear Event: A Framework for Complex Systems Analysis.”

International Journal of Intelligence and Counter Intelligence 25, no. 3 (2012): 508–528.Bruce, James B. “Making Analysis More Reliable: Why Epistemology Matters to Intelligence.” Analyzing Intelligence: Origins,

Obstacles, and Innovations, 171–190, 2008.Chang, Otto H., and Thomas M. McCarty. “Evidence on Judgment Involving the Determination of Substantial Authority: Tax

Practitioners versus Students.” The Journal of the American Taxation Association 10, no. 1 (1988): 26–39.Chang, Welton, and Philip E. Tetlock. “Rethinking the Training of Intelligence Analysts.” Intelligence and National Security

31, no. 6 (2016): 903–920.Chang, Welton, Eva Chen, Barbara Mellers, and Philip E. Tetlock. “Developing Expert Political Judgment: The Impact of

Training and Practice on Judgmental Accuracy in Geopolitical Forecasting Tournaments.” Judgment and Decision Making 11, no. 5 (2016): 509–526.

Collier, David, and James E. Mahon. “Conceptual ‘Stretching’ Revisited: Adapting Categories in Comparative Analysis.” American Political Science Review 87, no. 04 (1993): 845–855.

Cooper, Jeffrey R. Curing Analytic Pathologies: Pathways to Improved Intelligence Analysis. Washington, DC: Central Intelligence Agency, Center for Study of Intelligence, 2005.

Coulthart, Stephen. “Why Do Analysts Use Structured Analytic Techniques? An in-Depth Study of an American Intelligence Agency.” Intelligence and National Security 31, no. 7 (2016): 933–948.

Dhami, Mandeep K., Ian K. Belton, and Kathryn E. Careless. “Critical Review of Analytic Techniques.” Intelligence and Security Informatics Conference (EISIC), 2016 European, 152–155.

Fishbein, Warren, and Gregory Treverton. Rethinking ‘Alternative Analysis’ to Address Transnational Threats. Central Intelligence Agency, 2004.

Friedman, Jeffrey A., and Richard Zeckhauser. “Why Assessing Estimative Accuracy is Feasible and Desirable.” Intelligence and National Security 31, no. 2 (2016): 178–200.

Greenwood, John D. “Two Dogmas of Neo-Empiricism: The ‘Theory-Informity’ of Observation and the Quine-Duhem Thesis.” Philosophy of Science 57, no. 4 (1990): 553–574.

Griffin, Dale, and Amos Tversky. “The Weighing of Evidence and the Determinants of Confidence.” Cognitive Psychology 24, no. 3 (1992): 411–435.

Harris, Douglas H., and V. Alan Spiker. “Critical Thinking Skills for Intelligence Analysis.” In Ergonomics – A Systems Approach, edited by Isabel L. Nunes. Zagreb, Croatia: InTech Open Access Publisher, 2012.

Heuer Jr, Richards J. “The Evolution of Structured Analytic Techniques.” Presentation to the National Academy of Science, National Research Council Committee on Behavioral and Social Science Research to Improve Intelligence Analysis for National Security, 2009, 529–545.

Hoffman, Paul J., Paul Slovic, and Leonard G. Rorer. “An Analysis-of-Variance Model for the Assessment of Configural Cue Utilization in Clinical Judgment.” Psychological Bulletin 69, no. 5 (1968): 338–349.

Jervis, Robert. System Effects: Complexity in Political and Social Life. Princeton, NJ: Princeton University Press, 1998.Johnston, Rob. Analytic Culture in the US Intelligence Community: An Ethnographic Study. Washington, DC: U.S. Central

Intelligence Agency, Center for the Study of Intelligence, 2005.Kahneman, Daniel, and Amos Tversky. “On the Reality of Cognitive Illusions.” Psychological Review 103, no. 3 (1996): 582–591.Mandel, David R. “Accuracy of Intelligence Forecasts from the Intelligence Consumer’s Perspective.” Policy Insights from

the Behavioral and Brain Sciences 2, no. 1 (2009): 111–120.Mandel, David R. Commissioned report to the Committee on Field Evaluation of Behavioral and Cognitive Sciences-Based

Methods and Tools for Intelligence and Counter-intelligence, Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies, 2009.

Mandel, David R. “Are Risk Assessments of a Terrorist Attack Coherent?” Journal of Experimental Psychology: Applied 11, no. 4 (2005): 277.

Mandel, David R. “Instruction in Information Structuring Improves Bayesian Judgment in Intelligence Analysts.” Frontiers in Psychology 6, no. 387 (2015): 1–12.

Mandel, David R. “Violations of Coherence in Subjective Probability: A Representational and Assessment Processes Account.” Cognition 106, no. 1 (2008): 130–156.

Mandel, David R., and Alan Barnes. “Accuracy of Forecasts in Strategic Intelligence.” Proceedings of the National Academy of Sciences 111, no. 30 (2014): 10984–10989.

Mandel, David R., and Philip E. Tetlock. “Debunking the Myth of Value-Neutral Virginity: Toward Truth in Scientific Advertising.” Frontiers in Psychology 7, no. 451 (2016): 1–5.

Page 21: CAN UNCLASSIFIEDD - cradpdf.drdc-rddc.gc.cacradpdf.drdc-rddc.gc.ca/PDFS/unc289/p805913_A1b.pdf · (WMD) programs. 6 The techniques are also taught in DIA’s Professional Analyst

Mandel, David R., Alan, Barnes, and Karen Richards. A Quantitative Assessment of the Quality of Strategic Intelligence Forecasts. No. 2013-036. Technical Report, 2014.

Marchio, James. “How Good is Your Batting Average?” Early IC Efforts to Assess the Accuracy of Estimates.” Studies in Intelligence 60, no. 4 (2016): 3–13.

Marrin, Stephen. “Evaluating the Quality of Intelligence Analysis: By What (Mis) Measure?” Intelligence and National Security 27, no. 6 (2012): 896–912.

Marrin, Stephen. Improving Intelligence Analysis: Bridging the Gap between Scholarship and Practice. London: Routledge, 2012.Marrin, Stephen. “Training and Educating U.S. Intelligence Analysts.” International Journal of Intelligence and CounterIntelligence

22, no. 1 (2009): 131–146.Meixner, Wilda F., and Robert B. Welker. “Judgment Consensus and Auditor Experience: An Examination of Organizational

Relations.” Accounting Review (1988): 505–513.Miller, C. Chet, and R., Duane Ireland “Intuition in Strategic Decision Making: Friend or Foe in the Fast-Paced 21st Century?.”

The Academy of Management Executive (1993–2005) 19, no. 1 (2005): 19–30.Moore, Don A., and Paul J. Healy. “The Trouble with Overconfidence.” Psychological Review 115, no. 2 (2008): 502.Nickerson, Raymond S. “Confirmation Bias: A Ubiquitous Phenomenon in Many Guises.” Review of General Psychology 2,

no. 2 (1998): 175.Gilovich, Thomas. How We Know What Isn’t So: The Fallibility of Human Reason in Everyday Life. London: Simon and Schuster,

1993.Hammond, Kenneth R. “How Convergence of Research Paradigms Can Improve Research on Diagnostic Judgment.” Medical

Decision Making 16, no. 3 (1996): 281–287.Haran, Uriel, Ilana Ritov, and Barbara A. Mellers. “The Role of Actively Open-Minded Thinking in Information Acquisition,

Accuracy, and Calibration.” Judgment and Decision Making 8, no. 3 (2013): 188.Heuer, Richards J. “Taxonomy of Structured Analytic Techniques.” International Studies Association Annual Convention, 2008:

1–6.Heuer, Richards J. Psychology of Intelligence Analysis. Washington, DC: U.S. Central Intelligence Agency, Center for the Study

of Intelligence, 1999.Ho, Emily H., David V. Budescu, Mandeep K. Dhami, and David R. Mandel. “Improving the Communication of Uncertainty

in Climate Science and Intelligence Analysis.” Behavioral Science & Policy 1, no. 2 (2015): 43–55.Kahneman, Daniel, and Amos Tversky. “On the Study of Statistical Intuitions.” Cognition 11, no. 2 (1982): 123–141.Kahneman, Daniel, Andrew M. Rosenfield, Linnea Gandhi, and Tom Blaser. “Noise: How to Overcome the High, Hidden Cost

of Inconsistent Decision Making.” Harvard Business Review 94, no. 10 (2016): 38–46.Kahneman, Daniel, and Amos Tversky. “On the Psychology of Prediction.” Psychological Review 80, no. 4 (1973): 237.Khalsa, Sundri. “The Intelligence Community Debate over Intuition versus Structured Technique: Implications for Improving

Intelligence Warning.” Journal of Conflict Studies 29 (2009): 112.Lichtenstein, Sarah, and Baruch Fischhoff. “Training for Calibration.” Organizational Behavior and Human Performance 26,

no. 2 (1980): 149–171.Lehner, Paul Edward, Leonard Adelman, Brant A. Cheikes, and Mark J. Brown. “Confirmation Bias in Complex Analyses.” IEEE

Transactions on Systems, Man, and Cybernetics-Part a: Systems and Humans 38, no. 3 (2008): 584–592.Tetlock, Philip E., and Barbara A. Mellers. “Intelligent Management of Intelligence Agencies: Beyond Accountability Ping-

Pong.” American Psychologist 66, no. 6 (2011): 542.Marchio, Jim. “How Good is Your Batting Average? Early IC Efforts to Assess the Accuracy of Estimates.” Studies in Intelligence

60, no. 4 (2016): 3–13.National Research Council. Intelligence Analysis: Behavioral and Social Scientific Foundations. Washington, DC: National

Academies Press, 2011.Ordóñez, Lisa, and Lehman Benson. “Decisions under Time Pressure: How Time Constraint Affects Risky Decision Making.”

Organizational Behavior and Human Decision Processes 71, no. 2 (1997): 121–140.Phillips, Lawrence D., and Ward Edwards. “Conservatism in a Simple Probability Inference Task.” Journal of Experimental

Psychology 72, no. 3 (1966): 346.Piercey, M. “Motivated Reasoning and Verbal Vs. Numerical Probability Assessment: Evidence from an Accounting Context.”

Organizational Behavior and Human Decision Processes 108, no. 2 (2009): 330–341.United States Government. “Intelligence Reform and Terrorism Prevention Act of 2004.” Public Law 458 (2005): 108.Rieber, Steven. “Intelligence Analysis and Judgmental Calibration.” International Journal of Intelligence and Counter

Intelligence 17, no. 1 (2004): 97–112.Spielmann, Karl. “I Got Algorithm: Can There Be a Nate Silver in Intelligence?” International Journal of Intelligence and Counter

Intelligence 29, no. 3 (2016): 525–544.Tetlock, Philip E., and Barbara A. Mellers. “Intelligent Management of Intelligence Agencies: Beyond Accountability Ping-

Pong.” American Psychologist 66, no. 6 (2011): 542.Tetlock, Philip E., and Dan Gardner Superforecasting: The Art and Science of Prediction. New York: The Art and Science of

Prediction. Random House, 2015.Tikuisis, Peter, and David R. Mandel. “Is the World Deteriorating?” Global Governance: A Review of Multilateralism and

International Organizations 21, no. 1 (2015): 9–14.

Page 22: CAN UNCLASSIFIEDD - cradpdf.drdc-rddc.gc.cacradpdf.drdc-rddc.gc.ca/PDFS/unc289/p805913_A1b.pdf · (WMD) programs. 6 The techniques are also taught in DIA’s Professional Analyst

Tversky, Amos, and Daniel Kahneman. “Extensional versus Intuitive Reasoning: The Conjunction Fallacy in Probability Judgment.” Psychological Review 90, no. 4 (1983): 293.

Walton, Timothy. Challenges in Intelligence Analysis: Lessons from 1300 BCE to the Present. New York: Cambridge University Press, 2010.

Wegener, Duane T., and Richard E. Petty. “The Flexible Correction Model: The Role of Naïve Theories of Bias in Bias Correction.” Advances in Experimental Social Psychology 29 (1997): 141–208.

West, Richard F., Maggie E. Toplak, and Keith E. Stanovich. “Heuristics and Biases as Measures of Critical Thinking: Associations with Cognitive Ability and Thinking Dispositions.” Journal of Educational Psychology 100, no. 4 (2008): 930.

Page 23: CAN UNCLASSIFIEDD - cradpdf.drdc-rddc.gc.cacradpdf.drdc-rddc.gc.ca/PDFS/unc289/p805913_A1b.pdf · (WMD) programs. 6 The techniques are also taught in DIA’s Professional Analyst

DOCUMENT CONTROL DATA (Security markings for the title, abstract and indexing annotation must be entered when the document is Classified or Designated)

1. ORIGINATOR (The name and address of the organization preparing the document. Organizations for whom the document was prepared, e.g., Centre sponsoring a contractor's report, or tasking agency, are entered in Section 8.)

DRDC – Toronto Research Centre Defence Research and Development Canada 1133 Sheppard Avenue West P.O. Box 2000 Toronto, Ontario M3M 3B9 Canada

2a. SECURITY MARKING (Overall security marking of the document including special supplemental markings if applicable.)

CAN UNCLASSIFIED

2b. CONTROLLED GOODS

NON-CONTROLLED GOODS DMC A

3. TITLE (The complete document title as indicated on the title page. Its classification should be indicated by the appropriate abbreviation (S, C or U) in parentheses after the title.)

Restructuring structured analytic techniques in intelligence

4. AUTHORS (last name, followed by initials – ranks, titles, etc., not to be used)

Welton Chang, Elissabeth Berdini, David R. Mandel, Philip E. Tetlock

5. DATE OF PUBLICATION (Month and year of publication of document.)

November 2017

6a. NO. OF PAGES (Total containing information, including Annexes, Appendices, etc.)

2

6b. NO. OF REFS (Total cited in document.)

0

7. DESCRIPTIVE NOTES (The category of the document, e.g., technical report, technical note or memorandum. If appropriate, enter the type of report, e.g., interim, progress, summary, annual or final. Give the inclusive dates when a specific reporting period is covered.)

External Literature (P)

8. SPONSORING ACTIVITY (The name of the department project office or laboratory sponsoring the research and development – include address.)

DRDC – Toronto Research Centre Defence Research and Development Canada 1133 Sheppard Avenue West P.O. Box 2000 Toronto, Ontario M3M 3B9 Canada

9a. PROJECT OR GRANT NO. (If appropriate, the applicable research and development project or grant number under which the document was written. Please specify whether project or grant.)

9b. CONTRACT NO. (If appropriate, the applicable number under which the document was written.)

10a. ORIGINATOR’S DOCUMENT NUMBER (The official document number by which the document is identified by the originating activity. This number must be unique to this document.)

DRDC-RDDC-2017-P113

10b. OTHER DOCUMENT NO(s). (Any other numbers which may be assigned this document either by the originator or by the sponsor.)

11a. FUTURE DISTRIBUTION (Any limitations on further dissemination of the document, other than those imposed by security classification.)

Public release

11b. FUTURE DISTRIBUTION OUTSIDE CANADA (Any limitations on further dissemination of the document, other than those imposed by security classification.)

Page 24: CAN UNCLASSIFIEDD - cradpdf.drdc-rddc.gc.cacradpdf.drdc-rddc.gc.ca/PDFS/unc289/p805913_A1b.pdf · (WMD) programs. 6 The techniques are also taught in DIA’s Professional Analyst

12. ABSTRACT (A brief and factual summary of the document. It may also appear elsewhere in the body of the document itself. It is highly desirable that the abstract of classified documents be unclassified. Each paragraph of the abstract shall begin with an indication of the security classification of the information in the paragraph (unless the document itself is unclassified) represented as (S), (C), (R), or (U). It is not necessary to include here abstracts in both official languages unless the text is bilingual.)

Structured analytic techniques (SATs) are intended to improve intelligence analysis by checking the two canonical sources of error: systematic biases and random noise. Although both goals are achievable, no one knows how close the current generation of SATs comes to achieving either of them. We identify two root problems: (1) SATs treat bipolar biases as unipolar. As a result, we lack metrics for gauging possible over-shooting—and have no way of knowing when SATs that focus on suppressing one bias (e.g., overconfidence) are triggering the opposing bias (e.g., under-confidence); (2) SATs tacitly assume that problem decomposition (e.g., breaking reasoning into rows and columns of matrices corresponding to hypotheses and evidence) is a sound means of reducing noise in assessments. But no one has ever actually tested whether decomposition is adding or subtracting noise from the analytic process—and there are good reasons for suspecting that decomposition will, on balance, degrade the reliability of analytic judgment. The central shortcoming is that SATs have not been subject to sustained scientific of the sort that could reveal when they are helping or harming the cause of delivering accurate assessments of the world to the policy community. ___________________________________________________________________________

13. KEYWORDS, DESCRIPTORS or IDENTIFIERS (Technically meaningful terms or short phrases that characterize a document and could be helpful in cataloguing the document. They should be selected so that no security classification is required. Identifiers, such as equipment model designation, trade name, military project code name, geographic location may also be included. If possible keywords should be selected from a published thesaurus, e.g., Thesaurus of Engineering and Scientific Terms (TEST) and that thesaurus identified. If it is not possible to select indexing terms which are Unclassified, the classification of each should be indicated as with the title.)

structured analytic techniques, intelligence analysis, scientific method