Theory-Based Causal Induction - Princeton University · 2009. 11. 6. · In 1695, Sir Edmond Halley was computing the orbits of a set of comets for inclusion in Newton s Principia

Theory-Based Causal Induction

Thomas L. GriffithsUniversity of California, Berkeley

Joshua B. TenenbaumMassachusetts Institute of Technology

Inducing causal relationships from observations is a classic problem in scientific inference, statistics, andmachine learning. It is also a central part of human learning, and a task that people perform remarkablywell given its notorious difficulties. People can learn causal structure in various settings, from diverseforms of data: observations of the co-occurrence frequencies between causes and effects, interactionsbetween physical objects, or patterns of spatial or temporal coincidence. These different modes oflearning are typically thought of as distinct psychological processes and are rarely studied together, butat heart they present the same inductive challenge—identifying the unobservable mechanisms thatgenerate observable relations between variables, objects, or events, given only sparse and limited data.We present a computational-level analysis of this inductive problem and a framework for its solution,which allows us to model all these forms of causal learning in a common language. In this framework,causal induction is the product of domain-general statistical inference guided by domain-specific priorknowledge, in the form of an abstract causal theory. We identify 3 key aspects of abstract priorknowledge—the ontology of entities, properties, and relations that organizes a domain; the plausibilityof specific causal relationships; and the functional form of those relationships—and show how theyprovide the constraints that people need to induce useful causal models from sparse data.

Keywords: causal induction, intuitive theories, rational analysis, Bayesian modeling

In 1695, Sir Edmond Halley was computing the orbits of a setof comets for inclusion in Newton’s Principia Mathematica whenhe noticed a surprising regularity: The comets of 1531, 1607, and1682 took remarkably similar paths across the sky, and visited theEarth approximately 76 years apart. Newton had already shownthat comets should follow orbits corresponding to conic sections—parabolas, hyperbolas, and ellipses—although no elliptical orbitshad yet been observed. Halley inferred that the sightings of thesecomets were not three independent events, but three consequencesof a single common cause: a comet that had visited the Earth threetimes, travelling in an elliptical orbit. He went on to predict that itwould return along the same orbit in 1758. The comet returned as

predicted, and has continued to visit the Earth approximately every76 years since, providing a sensational confirmation of Newton’sphysics.

Halley’s discovery is an example of causal induction: inferringcausal structure from data. Explaining this discovery requiresappealing to two factors: abstract prior knowledge, in the form ofa causal theory, and statistical inference. The prior knowledge thatguided Halley was the mathematical theory of physics laid out byNewton. This theory identified the entities and properties relevantto understanding a physical system, formalizing notions such asvelocity and acceleration, and characterized the relations that canhold among these entities. Using this theory, Halley could generatea set of hypotheses about the causal structure responsible for hisastronomical observations: They could have been produced bythree different comets, each travelling in a parabolic orbit, or byone comet, travelling in an elliptical orbit. Choosing between thesehypotheses required the use of statistical inference. While Halleymade no formal computations of the probabilities involved, thesimilarity in the paths of the comets and the fixed interval betweenobservations convinced him that “it was highly probable, not to saydemonstrative, that these were but one and the same Comet” (fromthe Journal Book of the Royal Society, July 1696, reproduced inHughes, 1990, p. 353).

Causal induction is not just a problem faced by scientists. Thecapacity to reason about the causes of events is an essential part ofcognition from early in life, whether we are inferring the forcesinvolved in physical systems (e.g., Shultz, 1982b), the mentalstates of others (e.g., Perner, 1991), or the essential properties ofnatural kinds (e.g., S. A. Gelman & Wellman, 1991). Often, thesecausal relationships need to be inferred from data. Explaining howpeople make these inferences is not just a matter of explaininghow causation is identified from correlation, but of accounting for

Thomas L. Griffiths, Department of Psychology and Program in Cog-nitive Science, University of California, Berkeley; Joshua B. Tenenbaum,Department of Brain and Cognitive Sciences, Massachusetts Institute ofTechnology.

While completing this work, we were supported by the James S.McDonnell Causal Learning Research Collaborative, Air Force Office ofScientific Research Grants FA9550-07-1-0351 (to Thomas L. Griffiths)and FA9550-07-1-0075 (to Joshua B. Tenenbaum), and Army ResearchOffice MURI Grant W911NF-08-1-0242 (to Joshua B. Tenenbaum).

We thank Elizabeth Bonawitz, Tevye Krynski, Tania Lombrozo, Sour-abh Niyogi, Laura Schulz, and David Sobel for comments on the articleand related discussions, Marc Buehner for providing the data from Buehnerand Cheng (1997), and Hongjing Lu for providing a version of the datasetsconsidered by Perales and Shanks (2007) that included the valence of thecausal relationship. We are grateful to Elizabeth Bonawitz and AndreasStuhlmuller for their assistance in data collection.

Correspondence concerning this article should be addressed to ThomasL. Griffiths, Department of Psychology, 3210 Tolman Hall, MC 1650,University of California, Berkeley, CA 94720-1650. E-mail: [email protected]

Psychological Review © 2009 American Psychological Association2009, Vol. 116, No. 4, 661–716 0033-295X/09/$12.00 DOI: 10.1037/a0017201

661

how complex causal structure is inferred in the absence of (statis-tically significant) correlation. People can infer causal relation-ships from samples too small for any statistical test to producesignificant results (e.g., Gopnik, Sobel, Schulz, & Glymour, 2001)and solve problems like inferring hidden causal structure (e.g.,Kushnir, Gopnik, Schulz, & Danks, 2003) that still pose a majorchallenge for statisticians and computer scientists. Human causalinduction is not always on target: sometimes we miss causalconnections that would be most valuable to exploit, or see con-nections that do not in fact exist. Yet the successes stand out. Noconventional statistical recipe or computer learning algorithm cancompete with a young child’s capacity to discover the causalstructure underlying everyday experience—or at least, to comeclose enough to causal “ground truth” with knowledge that sup-ports such flexible prediction, planning, and action in the world.

In this article, we present a formal framework for explaininghow human causal learning works across a wide range of contextsand information sources. Our goal here is not a mechanistic ex-planation in terms of psychological processing steps or neuralmachinery. Rather we want to explain how human learners cansuccessfully infer such rich causal models of the world given thatthe data they observe are so sparse and limited. Our explanationsof human causal induction take the form of computational theories,in the sense introduced by Marr (1982) and pursued by Shepard(1987) and Anderson (1990), among others (see Oaksford &Chater, 1998): We identify the abstract computational problemaddressed by a cognitive capacity, derive an optimal solution tothat problem, and use that solution to explain human behavior.

In our analysis of the computational problem underlying every-day causal induction, the two factors of prior knowledge andstatistical inference that we identified in Halley’s famous discov-ery both play central roles. Prior knowledge, in the form of anabstract theory, generates hypotheses about the candidate causalmodels that can apply in a given situation. Principles of Bayesianinference generate weights for these hypotheses in light of ob-served data and thus predictions about which causal relations arelikely to hold, and which patterns of future events are likely to beobserved. To the extent that the prior knowledge is veridical—when people’s abstract intuitive theories reflect the way causalsystems in their environment tend to work—our rational frame-work explains how people’s inferences about the structure ofspecific causal systems can be correct, even given very little data.Yet our framework is not strictly normative: In cases where peoplehold the wrong abstract theories, rational statistical inference maylead them to incorrect beliefs about a novel system even givenextensive experience.

The idea that causal induction draws on prior knowledge is notnovel—it has been noted in many influential theories (e.g., Cheng,1997; Lien & Cheng, 2000), discussed in the context of rationalstatistical inference (Alloy & Tabachnik, 1984), and explored intwo separate research programs (Koslowski, 1996; Waldmann,1996; Waldmann, Hagmayer, & Blaisdell, 2006). However, pre-vious formal models have focused mostly on the effects of specificforms of prior knowledge, such as the plausibility of a causalrelationship, painting a relatively simple picture of the role thisknowledge plays in learning. Our contribution is a formal frame-work that provides a way to systematically identify the aspects ofprior knowledge that can influence causal induction, to describethis knowledge precisely, and to explain how it is combined with

rational mechanisms of statistical inference. We call this frame-work theory-based causal induction. We propose that three aspectsof prior knowledge are central in generating hypotheses for causalinduction—the ontology of entities, properties, and relations thatorganizes a domain; the plausibility of specific causal relation-ships; and the functional form of those relationships—and thatthese three aspects are the key constituents of people’s intuitivecausal theories. Mathematical models of causal induction in spe-cific settings can be derived by performing Bayesian inferenceover the hypothesis spaces generated by appropriate theories ofthis sort, and they illustrate how relatively complex interactionsbetween prior knowledge and data can emerge.

By viewing causal induction as the result of domain-generalstatistical inference guided by domain-specific causal theories, ourframework provides a unified account of a set of phenomena thathave traditionally been viewed as distinct. Different aspects ofcausal learning have tended to be explained in different ways.Theories of causal learning from contingency data, in which peo-ple are provided with information about the frequency with whichcause and effect co-occur and are asked to evaluate the underlyingrelationship, emphasize statistical learning from covariation be-tween cause and effect (e.g., Cheng, 1997; Jenkins & Ward, 1965;Shanks, 1995b). In contrast, analyses of learning about the causalrelationships that govern physical systems, such as simple ma-chines, tend to focus on the role of domain-specific knowledgeabout the nature of possible causal mechanisms (e.g., Bullock,Gelman, & Baillargeon, 1982; Shultz, 1982b). Finally, inferencesabout causal relationships based on spatial and temporal dimen-sions of dynamic events—most famously illustrated in Michotte’s(1963) classic studies of perceived causality in collisions—areoften viewed as the product of a modular, automatic perceptualmechanism, distinct from a general cognitive capacity for statisti-cal inference thought to underlie causal learning from contingencydata (e.g., Leslie, 1986; Schlottmann & Shanks, 1992).

From the perspective of theory-based causal induction, theseapparently disparate phenomena are not discrete cases requiringseparate explanations but rather points on a continuum, where thestrength of the constraints provided by prior knowledge graduallyincreases, and the amount of information required in order to makea causal inference decreases accordingly. Standard experiments oncausal induction from covariational data tap into relatively weakprior knowledge and hence require a relatively large number ofexperienced events (typically, tens of data points) for learners toreach confident causal conclusions. Causal learning in simplephysical systems draws on richer knowledge about causal mech-anisms, allowing confident causal inferences from only a handfulof examples. When even richer (if implicit) knowledge about thespatiotemporal dynamics of physical interactions is involved, as instandard cases of perceptual causality, confident inferences can bemade from just a single observed event—a “suspicious coinci-dence”—with the appropriate spatiotemporal structure.

The plan of the article is as follows. In the next section, wesummarize previous work illustrating how different aspects ofprior knowledge can influence causal learning. We then discuss thegoals of a computational-level analysis of causal induction, focus-ing on a description of the central inductive problems to be solvedby the learner. This description introduces causal graphical mod-els, a formalism for representing and reasoning about causal rela-tionships which has been the basis for previous accounts of human

662 GRIFFITHS AND TENENBAUM

causal induction, but which crucially does not have the ability toexpress the forms of abstract prior knowledge that guide humancausal learning along such different trajectories in different do-mains and contexts. We then introduce our framework of theory-based causal induction, building on the foundation of causal graph-ical models but making explicit the structure and function of thelearner’s prior knowledge. The bulk of the article consists of theapplication of this framework to the settings mentioned above:causal induction from contingency data, learning about the prop-erties of physical systems, and inferring causal relationships fromcoincidences in space and time. In considering these phenomena,we focus on the importance of the two key components of ourapproach—statistical inference and prior knowledge expressed inthe form of a causal theory—in explaining how people can learnabout causal relationships from limited data.

How Does Prior Knowledge Influence Causal Induction?

The study of causal induction has a long history, in both phi-losophy (e.g., Hume, 1739/1978) and psychology (e.g., Inhelder &Piaget, 1958). Detailed reviews of some of this history are pro-vided by Shultz (1982b; Shultz & Kestenbaum, 1985) and White(1990, 1995). This history is marked by a tension between statis-tical learning and abstract prior knowledge about causality asaccounts of human causal induction. Psychological theories aboutcausal induction have tended to emphasize one of these two factorsover the other (Cheng, 1997; Newsome, 2003; Shultz, 1982b): Inthe tradition of Hume (1739/1978), covariation-based approachescharacterize human causal induction as the consequence of adomain-general statistical sensitivity to covariation between causeand effect (e.g., Cheng & Novick, 1990, 1992; Shanks & Dickin-son, 1987), whereas, in a tradition often traced to Kant (1781/1964;see Shultz, 1982b, for an account of the connection), mechanism-based approaches focus on the role of prior knowledge about themechanisms by which causal force can be transferred (e.g., Ahn &Kalish, 2000; Shultz, 1982b; White, 1995).

Recently, explanations of human causal learning have begun toexplore a middle ground between these positions, looking at howmechanism knowledge might influence learning from covariationbetween cause and effect (e.g., Lagnado & Sloman, 2004;Lagnado, Waldmann, Hagmayer, & Sloman, 2007; Waldmann,1996; Waldmann et al., 2006). These accounts are based on a rangeof results indicating the importance of both of these factors. Theanalysis that we present in this article can be viewed in part as anattempt to develop a formal framework that can capture the kind ofknowledge needed to explain these effects, providing the toolsrequired to define computational models of this “knowledge-based” approach to causal induction (Waldmann, 1996). As a firststep toward developing such a framework, we need to identifyexactly what aspects of prior knowledge are relevant to causalinduction. In this section, we briefly review work that has exploredthis question, using the example of Halley’s discovery for illus-tration. Following this example, we divide the kind of prior knowl-edge that is relevant to causal induction into three categories:information about the types of entities, properties, and relationsthat arise in a domain (the ontology); constraints on the plausiblerelations among these entities; and constraints on the functionalform of such relations.1

Ontology

Newton’s theory of physics picked out the critical variables forthinking about the motion of objects—their mass, velocity, andacceleration. The question of how entities are differentiated on thebasis of their causal properties has been thoroughly explored indevelopmental psychology, through consideration of the ontolog-ical commitments reflected in the behavior of infants and youngchildren. Both infants and young children have strong expectationsabout the behavior of physical objects, and these expectations arequite different from those for intentional agents (Saxe, Tenen-baum, & Carey, 2005; Shultz, 1982a; Spelke, Phillips, & Wood-ward, 1995). Similarly, children have different expectations aboutthe properties of biological and nonbiological entities (e.g.,Springer & Keil, 1991). Gopnik et al. (2001) have shown thatchildren use the causal properties of entities to determine whetherthey belong to a novel type—objects that differed in appearancebut both activated a “detector” were more likely to both be con-sidered “blickets” than objects with similar appearance that dif-fered in their causal properties.

Research with adults has also explored how the types of entitiesinfluence causal inferences. For example, Lien and Cheng (2000)conducted several experiments examining the circumstances underwhich causal properties are generalized across the members of acategory. In a typical experiment, people learned about the ten-dency of 15 chemicals to produce blooming in a plant. Thechemicals could be divided into groups on the basis of their colorand shape. Lien and Cheng explored how people used informationabout color and shape, which provided a basis for identifyingdifferent types of chemicals in causal learning. Their conclusionwas that people used these types in learning causal relationships:People formed the generalization that chemicals of the type thatmaximized the strength of the resulting relationship were thosethat caused the plant to bloom. In related work, Waldmann andHagmayer (2006) examined how intuitive theories influencewhether previously learned categories are transferred to novelcausal learning problems. Tenenbaum and Niyogi (2003) alsoshowed that people spontaneously organize objects into types onthe basis of their causal properties, forming abstract categories ofschematic blocks that cause one another to light up in a computersimulation, and Kemp, Goodman, and Tenenbaum (2007) showedthat such categories could carry with them expectations about thestrength of causal relationships.

Plausible Relations

Knowledge of the types of entities in a domain can provide quitespecific information about the plausibility of causal relationships.For example, Newton precisely laid out the kinds of forces bywhich the properties of one object can influence those of another.Explorations of how plausibility influences causal induction have

1 Although we refer to expectations about ontologies, plausible relations,and functional form as prior knowledge, we mean this to be interpreted asindicating that the knowledge is available prior to a specific instance ofcausal induction. We are not claiming that this knowledge is innate, andanticipate that in almost all cases it is acquired through experience with theworld and in particular through other instances of causal induction, a pointthat we return to in our discussion of learning causal theories.

663THEORY-BASED CAUSAL INDUCTION

examined mainly how children learn about the structure of phys-ical systems (e.g., Shultz, 1982b), although even accounts ofcausal induction from contingency data that emphasize the impor-tance of covariation between cause and effect recognize a role fortop-down knowledge (e.g., Cheng, 1993, 1997). In one classicstudy, Shultz (1982b) demonstrated that young children havestrong expectations about the plausibility of different kinds ofcausal relationships, in part derived from their experience with theproperties of these objects in the course of the experiment. Forexample, he found that children used the knowledge that a lamp ismore likely than a fan to produce a spot of light, that a fan is morelikely than a tuning fork to blow out a candle, and that a tuningfork is more likely than a lamp to produce resonance in a box.

On the basis of examples like those provided by Shultz (1982b),several authors have equated the plausibility of a causal relation-ship with the existence of a potential mechanism by which thecause could influence the effect (e.g., Ahn & Kalish, 2000;Schlottmann, 1999). Koslowski and colleagues (Koslowski, 1996;Koslowski & Okagaki, 1986; Koslowski, Okagaki, Lorenz, &Umbach, 1989) have conducted a series of experiments investi-gating this claim, finding that people consider causal relationshipsmore plausible when supplied with a potential mechanism and lessplausible when the most likely mechanisms are ruled out.

Recent work examining causal learning in adults has also notedthe importance of prior expectations about the direction of causalrelationships, particularly when people are simultaneously learningabout multiple relationships. Waldmann and colleagues (Wald-mann, 1996, 2000; Waldmann & Holyoak, 1992; Waldmann,Holyoak, & Fratianne, 1995) have conducted a number of studiesthat suggest that people’s expectations about the causal structureamong a set of variables can determine how covariational evidenceaffects their beliefs. For example, Waldman (2000) gave peopleinformation that suggested that the relationship among a set ofvariables was either a “common cause” relationship, with onevariable causing several others, or a “common effect” relationship,with several variables all producing a single effect. People’s be-liefs about the underlying causal structure influence their interpre-tation of the pattern of covariation among the variables: Only thosewho believed in the common effect structure took into accountcompetition between causes when evaluating their strength.

Functional Form

In physics, the functional form of causal relationships, such ashow the velocity of one object depends on its mass and the massand velocity of another object with which it collides, can be laidout precisely. The knowledge that guides most causal inferences isless precise, but even in the most basic cases of causal inductionwe draw on expectations as to whether the effects of one variableon another are positive or negative, whether multiple causes inter-act or are independent, and what type of events (binary, continu-ous, or rates) are relevant to evaluating causal relationships(Cheng, 1997; Novick & Cheng, 2004).

One setting in which questions about functional form havearisen explicitly is in examining how causes should be assumed tocombine. Many theories of animal learning assume that multiplecauses of a single effect combine additively, each making a con-stant contribution to the effect (e.g., Rescorla & Wagner, 1972). Anumber of researchers, including Shanks, Wasserman, and their

colleagues, have advocated these linear models as accounts ofhuman causal learning (e.g., López, Cobos, Caño, & Shanks, 1998;Shanks, 1995a, 1995b; Shanks & Dickinson, 1987; Wasserman,Elek, Chatlosh, & Baker, 1993). However, whether this assump-tion is appropriate for modeling human judgments seems to beaffected by people’s beliefs about what aspect of the causesproduces the effect. Waldmann (2007) presented a study in whichparticipants were told about a hypothetical experiment that foundthat drinking a yellow liquid increased the heart rate of animals by3 points, while drinking a blue liquid increased the heart rate by 7points. The participants were asked to predict the consequences ofdrinking a mixture of the two liquids. The results depended uponwhether the participants were told that the effect of the drink wasa consequence of its taste, or of its strength. More people producedpredictions consistent with a weighted average of the effects ifthey believed the effect was modulated by strength, for which alinear functional form is more appropriate. Recent work has alsoshown that the magnitude of some effects that assume additivity isaffected by the extent to which people believe causes combineadditively (Beckers, De Houwer, Pineno, & Miller, 2005; Lovi-bond, Been, Mitchell, Bouton, & Frohart, 2003).

Perhaps the most comprehensive attempt to characterize thepossible ways in which causes could combine is that of Kelley(1973), who suggested that causal induction from small numbersof observations may be guided by causal schemas. Kelley distin-guished between generative and preventive causes, and he identi-fied three schemas describing the interaction between generativecauses: multiple sufficient causes, multiple necessary causes, andcompensatory causes. Under the multiple sufficient causesschema, the effect occurs in the presence of any one of the causes(the equivalent of a logical OR function). In the multiple necessarycauses schema, the effect occurs only if all of the causes arepresent (the equivalent of a logical AND function). In the com-pensatory causes schema, increasing the strength of each causeincreases the tendency for the effect to be expressed. All of theseschemas constitute different assertions about the functional formof the relationship between cause and effect, and knowing whichof these schemas is relevant in a particular situation can facilitateevaluating whether a particular causal relationship exists.

The functional form of causal relationships becomes most im-portant when dealing with causal inferences in physical systems,where the ways in which one object influences another can bequite complex. Developmental psychologists have extensively in-vestigated how well children understand the functional relation-ships that hold in physical systems. Shultz and Kestenbaum (1985)provided a review of some of this work. One interesting exampleof this project is provided by Zelazo and Shultz (1989), whoinvestigated whether children understood the different functionalrelationships between the potency of a cause and the resistance ofthe effect in two systems: a balance beam, where one object wasweighed against another, and a ramp, where one object slid downto displace another. For the balance beam, the magnitude of theeffect depends upon the difference in the masses of the twoobjects, whereas for the ramp, it depends upon the ratio. Zelazoand Shultz (1989) found that although adults were sensitive to thisdifference, 5-year-olds tended to use a single functional form forboth systems.

The functional form of a causal relationship can also determinethe temporal coupling between cause and effect. The time between


the occurrence of a potential cause and the occurrence of an effectis a critical variable in many instances of causal induction. Severalstudies have explored covariation and temporal proximity as cuesto causality in children, typically finding that the event that im-mediately precedes an effect is most likely to be perceived as thecause, even if there is covariational evidence to the contrary (e.g.,Shultz, Fisher, Pratt, & Rulf, 1986). Hagmayer and Waldmann(2002) presented an elegant series of studies that showed thatdifferent assumptions about the delay between cause and effectcould lead to different interpretation of the same set of events,determining which events were assumed to be related. Similarphenomena have recently been investigated in detail by Lagnadoand Sloman (2006) and Buehner and colleagues (Buehner & May,2002, 2003; Greville & Buehner, 2007). Finally, Anderson (1990)provided a computational analysis of data involving the interactionbetween spatial separation and temporal contiguity in causal in-duction.

Summary

The three aspects of prior knowledge identified in this sectioncan support strong expectations about possible causal relation-ships. Having an ontology, knowing the plausibility of relation-ships among the entities identified within that ontology, and know-ing the functional form of those relationships provides informationthat makes it possible to generalize about the causal relationshipsamong completely new variables. The research we have summa-rized in this section makes a compelling case for an influence ofprior knowledge on causal induction but raises the question ofexactly how this knowledge should be combined with the evidenceprovided by the data observed by learners. Answering this questionis the project undertaken in the remainder of the article. Our nextstep toward obtaining an answer is to understand the computa-tional problem underlying causal induction, which is the focus ofthe next section.

A Computational-Level Analysis of Causal Induction

The aim of this article is to provide a computational-levelanalysis of causal induction, in the sense introduced by Marr(1982). This section begins with a discussion of what such ananalysis means, clarifying our motivation and methodology. Wethen turn to the question of how to formulate the computationalproblem underlying causal induction. Our formulation of thisproblem makes use of causal graphical models, a formalism forrepresenting, reasoning with, and learning about causal relation-ships developed in computer science and statistics (Pearl, 2000;Spirtes, Glymour, & Scheines, 1993). We introduce this formalismand use it to clearly state the problem faced by causal learners. Wethen consider existing rational solutions to this problem, arguingthat while they feature one of the two factors that are necessary toexplain human causal induction—statistical inference—they donot incorporate the kind of prior knowledge described in theprevious section. Reflecting upon the nature of this knowledgeleads us to argue that a level of representation that goes beyondcausal graphical models will be required.

Analyzing Causal Induction at the Computational Level

Marr (1982) distinguished between three levels at which aninformation processing system can be analyzed: the levels of

computational theory, representation and algorithm, and hardwareimplementation. Analyses at the first of these levels answer thequestion “What is the goal of the computation, why is it appro-priate, and what is the logic of the strategy by which it can becarried out?” (Marr, 1982, p. 25). This is a question about theabstract problem that the information processing system is tryingto solve and what solutions to that problem might look like. Onepart of a computational-level analysis is thus considering the formof rational solutions to a problem faced by the learner, a strategythat is also reflected in Shepard’s (1987) search for universals lawsof cognition—laws that must hold true for any information-processing system due to the structure of the problem beingsolved—and Anderson’s (1990) formulation of rational analysis.

In the context of causal induction, a computational-level anal-ysis seeks to identify the abstract problem being solved whenpeople are learning about causal relationships and to understandthe logic that makes it possible to solve this problem. Since ouraim is to provide a unifying account of causal induction across arange of settings, we want to define the underlying computationalproblem in as broad a way as possible, highlighting the fact that asingle solution can be applied across these domains. A majorchallenge of this approach is finding a formal framework that canprovide a solution to the underlying computational problem, com-pounded by the fact that we want more than just any solution: Wewant a solution that is optimal for the problem being posed.

Developing an account of causal induction that is an optimalsolution to the underlying computational problem is attractive notjust as a unifying account, but as a way of answering three kindsof questions about human cognition. The first kind of question isa “How possibly?” question: How could people possibly solve theproblem of inferring causal relationships from observational data?Many aspects of human learning, including causal induction, seemfar better than that of any automated systems. To repeat an exam-ple from the introduction, it is commonplace for people to drawcorrect conclusions about causal relationships from far less datathan we might need to do a statistical test. Understanding how wemight explain such inferences in rational terms helps us understandhow it is that people are so good at them, and what factors—suchas domain knowledge—play a role in this success.

The second kind of question we can answer using this kind ofanalysis is a “How should it be done?” question: Given thecomputational problems that people face, what should they bedoing to solve those problems? Optimal solutions tell us somethingabout the properties that we might expect to see in the behavior ofintelligent organisms, and can thus tell us which aspects of thatbehavior might be purely a consequence of the nature of theproblems being solved. This kind of strategy is common in visionscience, where “ideal observer” models have helped reveal howmuch of human perception might be explained as an optimalresponse to the structure of the environment (Yuille & Kersten,2006). In the case of causal induction, a critical issue is combiningstatistical evidence with prior knowledge, and a rational accountcan indicate how this should be done, and what the consequencesshould be for the inferences that people make. Although researchin judgment and decision-making has illustrated that people oftendeviate from the predictions of rational models (e.g., Tversky &Kahneman, 1974), the revelations that this work has made aboutthe psychological mechanisms involved were partially made pos-sible by the existence of a well-developed account of how a


rational agent should make decisions. For other complex problemssuch as causal induction, we are only just beginning to developthese rational accounts, and understanding what people should dowhen making causal inferences will be a valuable tool in deter-mining how people actually solve this problem.

Finally, a third, related, question we can answer is a “What isnecessary?” question: What knowledge or other constraints onhypotheses would an ideal learner need in order to reach the sameconclusions as people? Since an ideal learner makes the best use ofthe available data, the answer to this question places a lower boundon the kind of constraints that human learners might use. Under-standing the impact of different kinds of prior knowledge on causalinduction by analyzing their effects on an ideal learner gives us away to predict the role that these kinds of knowledge might play inhuman causal induction.

Answering these three questions requires not just defining aproblem and deriving a solution, but arguing that this problem andsolution connect to human causal learning. This connection can beestablished only by comparing the predictions of models devel-oped within our formal framework to the results of experimentswith human participants. The empirical results that will be relevantto this argument are those that are framed at the same level ofabstraction as our analysis: results that indicate what conclusionspeople reach given particular data. As a consequence, we focusmainly on static measurements of beliefs about causal relation-ships, rather than capturing the dynamics of human learning,although we have explored this topic in the past (Danks, Griffiths,& Tenenbaum, 2003) and view it as an important direction forfuture research. The goal of our models is to produce the sameconclusions from the same data, and our framework will be suc-cessful if it allows us to define models that incorporate the kindsof knowledge that make this possible.

In pursuing a computational-level analysis, we are not trying tomake claims about the other levels at which causal induction mightbe analyzed. In particular, we are not asserting that particularrepresentations or algorithms are necessary, or making other com-mitments as to the mechanisms or the psychological processesinvolved. Marr (1982) argued that different levels of analysis willprovide constraints on one another, with the computational levelindicating what kinds of representations and algorithms will beappropriate for solving a problem. In the context of causal induc-tion, we anticipate that many different psychological mechanismscould result in behavior similar to the predictions made by specificmodels we consider, with associative learning, heuristics, or ex-plicit hypothesis testing being good strategies for individual tasks,and we briefly outline some possible psychological mechanisms inthe Discussion. Our general aim, however, is to provide a unifyingaccount at the more abstract level of the underlying problem andits solution, ultimately helping to explain why particular represen-tations and algorithms might be appropriate in a particular task.

Finally, our aim of providing a computational-level account ofcausal induction also influences the kinds of models that we usefor comparison. In this article, our emphasis is on comparison ofthe predictions of our account to those of other rational models.These models all use the same formal ideas and operate at the samelevel of analysis, but they differ in their assumptions about theknowledge that informs causal induction or the nature of statisticallearning. Comparison with these other rational models thus helpsto highlight which components of our framework are relevant to

explaining behavior on a given task. We do not doubt that it ispossible to define better models of specific tasks, since presumablyan accurate model of the actual mechanisms people use to solvethese problems will make better predictions than the abstract kindof analyses obtained from our framework. Ultimately, we see thekey criterion for the success of our approach to be its usefulness incapturing the effects of prior knowledge on causal induction acrossa wide range of settings, and it is this criterion that we have inmind when we evaluate the performance of individual models. Inthis way, we expect that our framework will be evaluated in thesame fashion as other general approaches that can be used todefine a variety of computational models, such as parallel distrib-uted processing (e.g., McClelland & Rumelhart, 1986) or produc-tion systems (e.g., Anderson, 1993).

Causal Graphical Models

Having introduced our motivation and methodology, we nowturn to the question of how to formulate the computational prob-lem posed by causal induction. We will do this using causalgraphical models, also known as Bayesian networks or Bayes nets.Causal graphical models have recently begun to be used in psy-chological accounts of causality (e.g., Danks & McKenzie, 2009;Glymour, 1998, 2001; Gopnik et al., 2004; Griffiths & Tenen-baum, 2005; Lagnado & Sloman, 2002; Lu, Yuille, Liljeholm,Cheng, & Holyoak, 2006, 2007, 2008; Rehder, 2003; Steyvers,Tenenbaum, Wagenmakers, & Blum, 2003; Tenenbaum & Grif-fiths, 2001, 2003; Waldmann & Martignon, 1998). In this article,we highlight only the elements of causal graphical models that arerelevant to our account. More detailed introductions are providedby Pearl (2000), Heckerman (1998), Glymour (2001), and Sloman(2005).

A causal graphical model has three components: a set of vari-ables, a causal structure defined upon those variables, and a set ofassumptions about the functional form of the relationships indi-cated by this structure. The variables are represented by nodes ina graph. These nodes are connected by arrows, indicating thedirection of causal dependencies among the variables. Assump-tions about the functional form of causal relationships make itpossible to use this graphical structure to reason about the proba-bilities of different kinds of events. The functional form defines aprobability distribution for each variable conditioned on its causes,which is referred to as the parameterization of the nodes.

Causal graphical models can be used to compute the probabilityof observing particular values for the variables and the conse-quences of interventions. An intervention is an event in which avariable is forced to hold a value, independent of any othervariables on which it might depend. Following Pearl (2000), wedenote intervention that sets a variable X to value x with do(x), andin general use uppercase letters to indicate variables and lowercaseletters to indicate their values. Probabilistic inference on a modi-fied graph, in which incoming edges to X are removed, can be usedto assess the consequences of intervening on X (Pearl, 2000;Spirtes et al., 1993).

The Computational Problem and Existing Solutions

Causal graphical models provide us with the tools to give aprecise definition of the computational problem underlying causal


induction. We take the problem of causal induction as that ofidentifying the causal graphical model—including both structureand parameters—responsible for generating the observed data D.This problem has been extensively explored in the literature oncausal graphical models in computer science and statistics, and itis typically divided into two parts (e.g., Griffiths & Tenenbaum,2005; Heckerman, 1998): structure learning and parameter estima-tion. We discuss these parts in turn, highlighting connections toexisting proposals about human causal induction.

Structure learning. Learning the causal structure that relates alarge number of variables is a difficult computational problem, asthe number of possible structures increases exponentially with thenumber of variables. Research in computer science and statisticshas focused on two strategies for solving this problem. Constraint-based algorithms attempt to identify causal structure on the basisof the patterns of dependency exhibited by a set of variables,whereas Bayesian methods evaluate the probability that a partic-ular structure generated the observed data.

Constraint-based algorithms for structure learning (e.g., Pearl,2000; Spirtes et al., 1993) proceed in two steps. First, standardstatistical tests such as Pearson’s �2 test are used to identify whichvariables are dependent and independent. Since different causalstructures should result in different patterns of dependency amongvariables, the observed dependencies provide constraints on the setof possible causal structures. The second step of the algorithmsidentifies this set, reasoning deductively from the pattern of de-pendencies. The result is one or more causal structures that areconsistent with the dependencies exhibited by the data. By notmaking any commitments about the consequences of causal rela-tionships other than statistical dependency, constraint-based algo-rithms provide a general-purpose tool for causal induction that canbe applied easily across many domains. This generality is part ofthe appeal of these algorithms as psychological theories, as theyprovide a way to explain the acquisition of causal knowledgewithout recourse to domain-specific mechanisms (e.g., Gopnik &Glymour, 2002; Gopnik et al., 2004).

The Bayesian approach to structure learning (Cooper & Hers-kovits, 1992; see Heckerman, 1998) treats causal induction as aspecial case of the more general statistical problem of identifyingthe statistical model most likely to have generated observed data.Bayesian inference provides a solution to this problem. The heartof this solution is Bayes’ rule, which can be used to evaluate theprobability that a hypothetical model h was responsible for gen-erating data D. The posterior distribution, P(h�D), is evaluated bycombining prior beliefs about the probability that h might generateany data, encoded in the distribution P(h), with the probability ofD under the model h, P(D�h), typically referred to as the likelihood.Bayes’ rule stipulates how these probabilities should be combined,giving

P�h�D� �P�D�h�P�h�

�h��HP�D�h��P�h��, (1)where H is the hypothesis space, the set of all models that couldpossibly have produced D.

As with any Bayesian inference, this approach requires speci-fying a prior probability and a likelihood for every hypothesiswithin a hypothesis space, H. In typical applications of thismethod, H consists of all directed graphs defined over the available

variables. The data D consist of the values that those variablesassume as the result of observation and intervention. StandardBayesian structure-learning algorithms define P(D�h) in a way thatmakes very weak assumptions about the functional form of therelationship between causes and effects. A separate parameter isused to express the probability of the effect for each configurationof its causes, meaning that the causes can have essentially any kindof influence on the effect—generative or preventive, large orsmall—and can combine in any imaginable way. P(D�h) is evalu-ated by defining a distribution over these parameters, and thenintegrating over the specific values the parameters take on (e.g.,Cooper & Herskovits, 1992). This makes it possible to computethe probability of the data given a particular graphical structurewithout committing to a particular choice of parameter values. Theprior over graph structures, P(h), is typically either uniform (giv-ing equal probability to all graphs), or gives lower probability tomore complex structures. Algorithms that use these principlesdiffer in whether they then proceed by searching the space ofstructures to find that with the highest posterior probability (Fried-man, 1997), or evaluate particular causal relationships by integrat-ing over the posterior distribution over graphs (Friedman & Koller,2000). Tenenbaum and Griffiths (2001; Griffiths & Tenenbaum,2005) developed a model of human causal induction based on theprinciples of Bayesian structure learning, which we discuss inmore detail later in the article.

Parameter estimation. Parameter estimation assumes a fixedcausal structure and aims to identify the parameters that specify theprobability of a variable given the values of the variables thatinfluence it. The simplest way to parameterize a causal graphicalmodel is to use a separate parameter for the probability of eachvalue of a variable given the values of its causes—something thatwe refer to as the generic parameterization. This generic parame-terization is the one typically used in Bayesian structure learning incomputer science, as discussed above. An alternative is to make astronger set of assumptions about the way in which causes com-bine to produce their effects. One such set of assumptions yieldsthe noisy-OR function, which is widely used in computer scienceand statistics (Pearl, 1988) and is a key part of a prominent modelof human causal induction (Cheng, 1997).

The noisy-OR function results from a natural set of assumptionsabout the relationship between cause and effect: that causes aregenerative, increasing the probability of the effect, that the effectoccurs in the absence of any causes with a constant probability w0,that each cause produces the effect with a constant probability wi,and that the opportunities for the causes to produce the effect areindependent (Cheng, 1997). For example, if we had an effectvariable E and a cause variable C, then the conditional probabilityof E given C would be

P�e��c; w0, w1� � 1 � �1 � w0��1 � w1�c, (2)

where w1 is a parameter associated with the strength of C and ctakes on values c� � 1 in the presence of the cause or c� � 0 inits absence. This expression gives w0 for the probability of E in theabsence of C, and w0 � w1 – w0w1 for the probability of E in thepresence of C. This parameterization is called a noisy-OR becauseif w0 and w1 are both 1, Equation 2 reduces to the logical ORfunction: The effect occurs if and only if either some backgroundfactor or C is present. With w0 and w1 in the range [0, 1] itgeneralizes this function to allow probabilistic causal relationships.


If E had multiple parents X1, . . . , Xn, we could associate a separatestrength wi with each parent, and the noisy-OR parameterizationwould give

P�e��x1, . . . , xn; w0, w1, . . . , wn� � 1 � �1 � w0��i

�1 � wi�xi,

(3)

where again xi � 1 if Xi is present, and 0 if Xi is absent.A simple solution to the problem of estimating the parameters of

a causal graphical model is to use maximum-likelihood estimation,choosing the values of the parameters that maximize the probabil-ity of the observed data. For the case of the noisy-OR function witha single cause, the maximum-likelihood estimate of w1 is

power �P�e��c�� P�e��c��

1 � P�e��c�� , (4)

where P(e��c�) is the empirical conditional probability of theeffect given the presence of the cause. We have labeled thisquantity power as it corresponds to Cheng’s (1997) definition ofcausal power, proposed as a rational model of human causalinduction. Glymour (1998) pointed out that the assumptions Cheng(1997) used in deriving this model are equivalent to those under-lying the noisy-OR parameterization, and Tenenbaum and Grif-fiths (2001; Griffiths & Tenenbaum, 2005) showed that causalpower is a maximum-likelihood estimator of w1.

The numerator of Equation 4 has also been proposed in its ownright as a model of human causal induction, being known as �P.This quantity,

�P � P�e��c�� P�e��c��,

reflects the change in the probability of the effect occurring as aconsequence of the occurrence of the cause. This measure was firstsuggested by Jenkins and Ward (1965), was subsequently exploredby Allan (1980, 1993; Allan & Jenkins, 1983), and has appeared invarious forms in both psychology and philosophy (Cheng &Holyoak, 1995; Cheng & Novick, 1990, 1992; Melz, Cheng,Holyoak, & Waldmann, 1993; Salmon, 1980). �P can also beshown to be a rational solution to the problem of estimating thestrength of a causal relationship, assuming that causes combinelinearly (Griffiths & Tenenbaum, 2005; Tenenbaum & Griffiths,2001).

Two Challenges for a Formal Framework

This brief survey of existing methods for solving the problem ofidentifying the causal graphical model that generated observeddata—and corresponding rational models of human causal induc-tion—highlights two challenges for the kind of formal frameworkwe aim to develop. First, this framework should naturally capturethe effects of knowledge on causal induction. Existing approachesmake either weak or generic assumptions about the nature of theknowledge that people use in evaluating causal relationships andare consequently limited in their ability to account for the effectsof ontology, plausibility, and functional form outlined in the pre-vious section. Second, the framework should be broad enough toencompass learning of both causal structure and the parametersthat describe a given causal relationship. We now discuss these

issues in turn, arguing that both can be addressed by adopting amore general Bayesian framework.

Capturing the effects of prior knowledge. The approaches tostructure learning and parameter estimation outlined above allmake either weak (in the case of structure learning) or general-purpose (in the case of parameter estimation) assumptions aboutthe nature of causal relationships. These assumptions are incom-patible with the richness of human knowledge about causal rela-tionships and the corresponding flexibility of human causal induc-tion exhibited in the examples discussed in the previous section. Inpart, this is a consequence of the context in which these approacheswere developed. In statistics and computer science, developingalgorithms that make minimal assumptions about the nature ofcausal relationships maximizes the number of settings in whichthose algorithms can be used. Psychological models of causalinduction have justified making general-purpose assumptionsabout the nature of causal relationships through the expectationthat it will be relatively straightforward to integrate the effects ofprior knowledge into the resulting models. For example, Cheng(1997, p. 370) stated:

The assumption that causal induction and the influence of domain-specific prior causal knowledge are separable processes is justified bynumerous experiments in which the influence of such knowledge canbe largely ignored . . . . The results of these experiments demonstratethat the induction component can indeed operate independently ofprior causal knowledge.

In the few cases where formal accounts of the integration of priorknowledge and data have been explored (e.g., Alloy & Tabachnik,1984; Lien & Cheng, 2000), these accounts have focused on justone aspect of prior knowledge, such as the plausibility of causalrelationships or the level of the ontology at which those relation-ships should be represented.

Constraint-based structure-learning algorithms are particularlylimited in their use of prior knowledge.2 Again, this is partly bydesign, being a result of the data-driven, bottom-up approach tocausal induction that these algorithms instantiate in a particularlyclear way. As these algorithms are defined, they use only a weakform of prior knowledge—the knowledge that particular causalrelationships do or do not exist (e.g., Spirtes et al., 1993). They donot use prior knowledge concerning the underlying ontology, theplausibility of relationships, or their functional form. This insen-sitivity to prior knowledge has previously been pointed out bysome critics of constraint-based algorithms in computer scienceand statistics (Humphreys & Freedman, 1996; Korb & Wallace,1997). Prior knowledge provides essential guidance to humaninferences, making it possible to infer causal relationships fromvery small samples. Without it, constraint-based algorithms re-quire relatively large amounts of data in order to detect a causalrelationship— enough to obtain statistically significant resultsfrom a statistical significance test.

2 It might not be impossible to develop a more global constraint-basedframework for causal induction, defined over richer representations of priorknowledge and integrating both bottom-up and top-down information in amore holistic style of inference. However, this would be a major departurefrom how constraint-based approaches have traditionally been developed(Spirtes et al., 1993; Glymour, 2001).


The need for relatively large amounts of data is compounded bythe fact that constraint-based algorithms cannot combine weaksources of evidence or maintain graded degrees of belief. This is adirect consequence of the policy of first conducting statistical tests,then reasoning deductively from the results. Statistical tests imposean arbitrary threshold on the evidence that data provide for a causalrelationship. Using such a threshold is a violation of what Marr(1982) termed the principle of least commitment, making it hard tocombine multiple weak sources of evidence. The binarization ofevidence is carried forward by deductively reasoning from theobserved patterns of dependency. Such a process means that aparticular causal structure can be identified only as consistent orinconsistent with the data, admitting no graded degrees of beliefthat might be updated through the acquisition of further evidence.

Although Bayesian structure learning can deal with weak evi-dence and graded degrees of belief, the standard assumptions aboutpriors, likelihoods, and hypothesis spaces mean that this approachis just as limited in its treatment of prior knowledge as constraint-based algorithms. However, it is relatively straightforward to mod-ify this approach to incorporate the effects of prior knowledge.Different assumptions about the functional form of causal relation-ships can be captured by including models with different param-eterizations in the hypothesis space, and the plausibility of causalrelationships can be used in defining the prior probability ofdifferent graph structures. Recent work in computer science hasbegun to explore methods that use more complex ontologies, witheach type of entity being characterized by a particular pattern ofcausal relationships with a particular functional form (e.g., Segal,Pe’er, Regev, Koller, & Friedman, 2003). This work is motivatedby problems in bioinformatics that, as in many of the settings forhuman causal induction, require learning complex structures fromlimited data (e.g., Segal, Shapira, et al., 2003).

A similar strategy can be used to incorporate the effects of priorknowledge in parameter estimation, allowing the expectations oflearners to influence their inferences about the strength of causalrelationships. Maximum-likelihood estimation finds values for pa-rameters based purely on the information contained in the data.This makes it hard for these models to incorporate the knowledgeof learners into the resulting estimates. Bayesian estimation tech-niques provide a way to combine existing knowledge with data,through a prior distribution on the parameters. For example, whenestimating the strength of a cause using the noisy-OR function, wemight have a prior expectation that causal relationships will tend tobe strong if they exist at all, corresponding to a prior distributionfavoring large values of w1. Lu et al. (2007, 2008) have developeda model of human causal learning based on Bayesian parameterestimation, using a general-purpose prior distribution favoringstrong causal relationships.

Learning both causal structures and parameter values. Thedistinction between structure learning and parameter estimation isvaluable when examining the assumptions behind different modelsof causal induction, but it is clear that both processes are keycomponents of human learning. In previous work (Griffiths &Tenenbaum, 2005) we emphasized the importance of structurelearning, in part because it was a component of causal inductionthat was not reflected in existing models, but we do not deny thatpeople are capable of learning causal strength and that certain tasksare more likely to tap this ability than others. We provide a moredetailed discussion of this point when we consider causal induction

from contingency data, where the relevant phenomena are perhapsclearest. However, the framework that we develop needs to besufficiently general that it can capture both of these aspects ofhuman causal induction.

The work of Lu et al. (2007, 2008) illustrates how a Bayesianapproach can be applied to the problem of estimating the strengthof a causal relationship. This analysis casts the problem in a sharedformal language with that of Bayesian structure learning, provid-ing a simple way to develop a unifying framework. In Bayesianparameter estimation, the hypothesis space is the set of values forthe parameters of a fixed causal structure. In Bayesian structurelearning, the hypothesis space is the set of possible causal struc-tures, evaluated by summing over the parameters. We can define asingle framework in which both kinds of inferences can be madeby defining our hypothesis space to consist of fully specifiedcausal graphical models, each with both a structure and a full setof parameters. Using this hypothesis space, we can estimate thestrength of a relationship by conditioning on a given structure andusing the posterior distribution on the strength parameter, as isdone by Lu et al. (2007, 2008). We can also answer a questionabout whether a particular causal relationship exists by summingover all hypotheses—including structure and parameters—andevaluating the probability of those hypotheses consistent with theexistence of the relationship, similar to the approach taken byGriffiths and Tenenbaum (2005).

Summary. We have identified two challenges for acomputational-level account of causal induction: incorporatingprior knowledge and allowing both structure learning and param-eter estimation. Both of these challenges seem to be something thatcan be addressed by adopting a more general Bayesian framework.Within this framework, the hypothesis space consists of a set offully specified graphical models, each with a structure and a fullset of parameters, and the knowledge of the learner is reflected insubstantive assumptions about the prior probability of hypotheses,the predictions that hypotheses make about data that are instanti-ated in the likelihoods, and the selection of the hypotheses thatcompose the hypothesis space. This leaves us with a new problem:Where do the priors, likelihoods, and hypotheses that are used inmaking a particular inference come from? Or, more precisely:How can we formalize the knowledge about ontologies, plausiblerelations, and functional form that allows hypothesis spaces to beconstructed? This is the question that we attempt to answer in theremainder of the article. First, however, we argue that this knowl-edge is something that cannot itself be captured in a causal graph-ical model.

Beyond Causal Graphical Models

Formulating the problem of causal induction as a Bayesiandecision as to which causal graphical model generated observeddata provides a precise specification of how prior knowledge couldguide this inference. Knowledge about the ontology, plausibility,and functional form of causal relationships should influence theprior, likelihood, and hypothesis space for Bayesian inference.However, expressing this knowledge requires going beyond therepresentational capacities of causal graphical models. Althoughthis knowledge can be instantiated in a causal graphical model, itgeneralizes over a set of such models, and thus cannot be ex-pressed in any one model.


Our inability to express prior knowledge relevant to causallearning in the form of a causal graphical model is partly becauseof an inherent limitation in the expressive capacity of graphicalmodels. Causal graphical models are formally equivalent to aprobabilistic form of propositional logic (e.g., Russell & Norvig,2002). A causal graphical model can be used to encode anyprobabilistic logical rule that refers to the properties of specificentities in the domain. However, causal graphical models cannotcapture the fact that there are different types of entities, or the waythat the types of entities involved in a potential relationship influ-ence our expectations about the plausibility and functional form ofthat relationship. Such notions require going beyond causal graph-ical models and considering richer probabilistic logics.

The knowledge that constrains causal learning is at a higherlevel of abstraction than specific causal structures, just as theprinciples that form the grammar for a language are at a higherlevel of abstraction than specific sentences (Tenenbaum, Griffiths,& Niyogi, 2007). The syntactic structure of a single sentencecannot express the grammar of a language, which makes state-ments about the syntactic structures of the set of sentences thatcompose that language. More generally, making statements aboutsets requires defining abstract variables that can be instantiated ina given member of the set and quantifying over the values of thosevariables. These higher level abstractions and generalizations re-quire adopting a representation that goes beyond that used by anymember of the set itself.

The development of probabilistic predicate logic remains anopen problem in artificial intelligence research (Friedman, Getoor,Koller, & Pfeffer, 1999; Kersting & De Raedt, 2000; Koller &Pfeffer, 1997; Milch, Marthi, & Russell, 2004; Muggleton, 1997).In the next section, we outline how some of the ideas behind thisresearch can be used to develop a different level of representationfor causal knowledge: a set of principles that can be used to guideinferences about the causal structure that was most likely to havegenerated observed data.

Theory-Based Causal Induction

So far, we have argued that human causal induction is affectedby prior knowledge in the form of ontological assumptions, beliefsabout the plausibility of causal relationships, and assertions aboutthe functional form of those relationships. Causal graphical modelsprovide us with a language in which we can express the compu-tational problem underlying causal induction and embody a set ofdomain-general assumptions about the nature of causality (includ-ing, for example, the effects of intervening on a variable). How-ever, causal graphical models are not sufficient to represent thedomain-specific knowledge that guides human inferences. In thissection, we develop a formal framework for analyzing how priorknowledge affects causal induction. First, we argue that the kind ofknowledge that influences human causal induction fits the descrip-tion of an intuitive theory, suggesting that the appropriate level ofrepresentation for capturing this knowledge is that of a causaltheory. We then consider the function and content of such theories,arguing that theories can play the role of hypothesis space gener-ators, and presenting a simple schema for causal theories thatmakes it easy to specify the information that is needed to generatea hypothesis space of causal graphical models.

Prior Knowledge and Causal Theories

Many cognitive scientists have suggested that human cognitionand cognitive development can be understood by viewing knowl-edge as organized into intuitive theories, with a structure analo-gous to scientific theories (Carey, 1985a; Gopnik & Meltzoff,1997; Karmiloff-Smith, 1988; Keil, 1989; Murphy & Medin,1985). This approach has been used to explain people’s intuitionsin the biological (Atran, 1995; Inagaki & Hatano, 2002; Medin &Atran, 1999), physical (McCloskey, 1983), and social (Nichols &Stich, 2003; Wellman, 1990) domains and suggests some deep andinteresting connections between issues in cognitive developmentand the philosophy of science (Carey, 1985a; Gopnik, 1996).

Although there are no formal accounts of intuitive theories,there is consensus on what kind of knowledge they incorporate: anontology, indicating the types of entities that can be encountered ina given domain, and a set of causal laws expressing the relationsthat hold among these entities. For example, Carey (1985b) statedthat:

A theory consists of three interrelated components: a set of phenom-ena that are in its domain, the causal laws and other explanatorymechanisms in terms of which the phenomena are accounted for, andthe concepts in terms of which the phenomena and explanatoryapparatus are expressed. (p. 394)

When discussing causal theories, it is often productive to distin-guish among different levels at which a theory might operate. In aphilosophical work that has inspired much of the treatment oftheories in cognitive development, Laudan (1977) made such adistinction, separating everyday scientific theory from higher level“research traditions.” He characterizes a research tradition as con-sisting of

an ontology which specifies, in a general way, the types of funda-mental entities which exist in the domain or domains within which theresearch tradition is embedded . . . . Moreover, the research traditionoutlines the different modes by which these entities can interact.(p. 79)

This distinction between these different levels of theory has beencarried over into research on cognitive development, where Well-man (1990) and Wellman and Gelman (1992) distinguished be-tween “specific” and “framework” theories:

Specific theories are detailed scientific formulations about a delimitedset of phenomena . . . framework theories outline the ontology and thebasic causal devices for their specific theories, thereby defining acoherent form of reasoning about a particular set of phenomena.(p. 341)

All of these definitions draw upon the same elements—ontologiesand causal laws.

The three aspects of prior knowledge that we have identified asplaying a role in causal induction map loosely onto the content ofintuitive theories identified in these definitions. The division of theentities in a domain into a set of different types is the role of anontology, and causal laws identify which relationships are plausi-ble and what form they take. This suggests that we might think ofthe knowledge that guides causal induction as being expressed ina causal theory. In particular, it is a theory that plays the role of aframework theory, providing a set of constraints that are used in


discovering the causal graphical model that describes a system, theanalogue of a specific theory. Causal theories thus constitute alevel of representation above that of causal graphical models,answering our question of how knowledge that is instantiated in aset of causal graphical models might be expressed. However,making this connection does not solve all of our problems: In orderto have a complete formal framework for modeling human causalinduction, we need to give an account of the function and contentof these causal theories.

Theories as Hypothesis Space Generators

The Bayesian framework sketched in the previous section leavesus with the problem of specifying a hypothesis space, a prior onthat space, and a likelihood for each hypothesis in that space. Thisproblem can be solved by defining a probabilistic procedure forgenerating causal graphical models. Such a procedure needs tospecify probability distributions from which the variables, struc-ture, and parameterization of causal graphical models are drawn.The hypothesis space is the set of causal graphical models that canbe generated by sampling from these distributions, the prior is theprobability with which a given model is generated by this process,and the likelihood is determined by the parameterization of thatmodel. By limiting which causal structures and parameterizationscan be generated, it is possible to impose strong constraints on thehypotheses considered when reasoning about a causal system. Weview this as the function of causal theories: They specify a recipethat can be used to generate hypothesis spaces for causal induction.

The commitments and consequences of this claim can be un-derstood by extending the analogy between language comprehen-sion and causal induction introduced in the previous section. Underthis analogy, a theory plays the same role in solving the problemof causal induction that a grammar plays in language comprehen-sion: Like a grammar, a theory generates the hypotheses used ininduction. A schematic illustration of the correspondence betweenthese two problems is shown in Figure 1. Under this view, thesolution to the inductive problem of causal learning has the samecharacter as identifying the syntactic structure of sentences: just asgrammars generate a space of possible phrase structures, theoriesgenerate a space of possible causal graphical models. Causallearning is thus a problem of “parsing” the states of the variablesin a system with respect to a causal theory. If the theory providesstrong enough constraints, such parsing can be done swiftly and

easily, picking out the causal structure that is most likely to havegenerated the data. Just as recent work in computational linguisticshas emphasized the value of probabilistic approaches in solvingsuch parsing problems (e.g., Chater & Manning, 2006; Manning &Schütze, 1999), the assumption that theories generate hypothesesand hypotheses generate data means that we can view each of theselevels of representation as specifying a probability distributionover the level below. The result is a hierarchical Bayesian model(Tenenbaum, Griffiths, & Kemp, 2006), supporting probabilisticinference at all of these levels.

Formalizing the Content of Causal Theories

To specify the content of causal theories, we need to identifytheir basic constituents and explain how these are used to generatecausal graphical models. When cognitive scientists appeal to anintuitive theory to explain the inferences that people make in agiven domain, they typically mean a structured representation withcausal content, similar in spirit to a scientific theory (e.g., Carey,1985a). As discussed above, accounts in philosophy of science andcognitive development are more precise about the structure andcontent of such theories, seeing them as constructed from anontology and causal laws (Carey, 1985b; Gopnik & Meltzoff,1997; Wellman, 1990; Wellman & Gelman, 1992). Providing aformal treatment of causal theories that captures their richness andcomplexity, as well as the breadth of inferences that they aresupposed to support, is a task that goes beyond the scope of thisarticle. We formalize just the aspects of causal theories relevant togenerating hypothesis spaces for causal induction. As a result, thetheories we discuss are far less abstract that what is typicallydescribed in discussions of framework theories, being just onelevel above the observable variables. We consider the possibilityof more abstract causal theories elsewhere (Tenenbaum et al.,2007).

The causal theories that we present in this article will havethree components, corresponding to the three aspects of priorknowledge that influence causal induction identified above, andthe three elements of the definition of a causal graphical model.These three components are an ontology, a set of principles thatidentify plausible relations, and a statement of the functional formof those relations. These three components of a theory each gen-erate one part of a causal graphical model, being the variables, thecausal structure, and the parameterization, respectively. We de-scribe these components using a combination of probability state-ments and first-order logic. Since our aim is to produce acomputational-level account of human causal induction, the spe-cific choices we have made in using this formalism are not in-tended to be interpreted as assertions about the nature of therepresentations that people actually use when solving these prob-lems, nor are the specific theories we present supposed to capturethe full complexity of the information that people have availableabout these systems. However, we are committed to the level ofrepresentation (i.e., using a language that is richer than that ofcausal graphical models) and the constraints that are embodied inthe theories, which are ultimately expressed as distributions overcausal graphical models. In this spirit, we have used a variety ofdifferent formalisms for causal theories in other presentations ofthese ideas (Griffiths, Baraff, & Tenenbaum, 2004; Griffiths &

(a) Grammar

Syntactic structure(parse tree)

Sentence

Causal theory

Data

Causal structure(graphical model)

(b)

Figure 1. Three levels of representation in (a) language comprehensionand (b) causal induction. Each level generates the level below, and lan-guage comprehension and causal induction both involve inferring themiddle level based upon data below and constraints from above.


Tenenbaum, 2007b; Tenenbaum & Griffiths, 2003; Tenenbaum &Niyogi, 2003).

To provide some concreteness to our formalism for expressingcausal theories, we consider a specific example throughout. Manyempirical studies of causal induction (e.g., Buehner & Cheng,1997; Buehner, Cheng, & Clifford, 2003; Lober & Shanks, 2000)use medical scenarios, such as evaluating the influence of chem-icals on gene expression. These studies typically examine learningabout a single causal relationship, such as whether injecting aparticular chemical into mice causes them to express a particulargene. While simple, even these settings have enough structure thatwe can identify a nontrivial theory expressing prior knowledge thatinfluences causal learning in this domain. We introduce our gen-eral framework in this simplest setting and then move to morecomplex settings, where richer prior knowledge allows learningfrom sparser data.

The first component of a theory, the ontology, identifies thetypes of entities that exist in a domain,3 the number of entities ofeach type (or a distribution over this number), and the predicatesthat can be used to describe these entities. Such an ontology is farmore limited than the kinds of ontologies considered in philosophyor computer science but is sufficient to express constraints oncausal induction. In our example, where participants discoverwhether chemicals cause genes to be expressed in mice, there arethree types of entities: Chemical, Gene, and Mouse. Any entity inthe domain must belong to one of these three types. The number ofentities of each type can either be stipulated or treated as a randomvariable drawn from a specific distribution. For example, we mightstate that the number of chemicals (NC), the number of genes (NG),and the number of mice (NM) are drawn independently fromdistributions PC, PG, and PM, respectively, but leave these distri-butions undefined—in many cases, NC, NG, and NM will be ap-parent, and we need not be concerned about generating them. Thepredicates defined on these types state which properties and relationscan take arguments of particular types, and what values those predi-cates can take on. In the example, these would includeInjected(Chemical, Mouse), indicating that a particular chemicalwas injected into a particular mouse, and Expressed(Gene, Mouse),indicating that a particular gene was expressed in a particular mouse.Both of these predicates are Boolean, being either true or false. Thisontology is summarized in Figure 2. The ontology required for thisexample is relatively simple, but the kind of knowledge that peoplehave in other situations may be much more complex. For example,

an ontology could be hierarchical, with objects belonging to typesat multiple levels and predicates applying based upon the type ateach of those levels (Griffiths & Tenenbaum, 2007b).

The second component of an intuitive theory is a set of rules thatdetermine which causal relationships are plausible. These rules canbe based upon the types of the entities involved or the predicatesthat apply to them. In the cases we consider, the rules will be basedpurely on types.4 In our example, the structure of the problem issuch that injecting chemicals does not cause injections of otherchemicals, and neither does gene expression. The only relation-ships with which we concern ourselves are those between chem-icals and genes. Figure 2 states a rule by which the plausibility ofsuch relationships might be expressed, assigning a probability p tothe existence of a causal relationship between a particular chemicaland a particular gene, regardless of the mouse involved. All othercausal relationships have probability 0.

The final component of an intuitive theory is a statement of thefunctional form that causal relationships are expected to possess.This requires specifying a parameterization (or distribution overparameterizations) for each predicate identified in the ontology.For the example, we need to define the probability that a particularmouse receives an injection of a particular chemical. This proba-bility will not influence any of our subsequent analyses and thus isnot specified: The theory indicates that this is a Bernoulli event,being true with some probability, but does not give the probability.In contrast, Expressed(G,M) is identified as a Bernoulli eventwith parameter , where is computed using a noisy-OR function,allowing each cause—in this case Injected(C,M) for some C—tohave an independent opportunity to influence the effect withprobability wi. The parameters wi are all assumed to be drawn froma uniform distribution, reflecting a lack of expectations about thestrengths of the causes, and making our hypothesis space containa continuum of causal graphical models in which the strength ofthe causal relationships varies between 0 and 1.

Generating a Hypothesis Space

The process by which a causal graphical model is generatedfrom a theory is as follows:

1. Generate variables. Sample the number of entities ofeach type from the distribution specified in the Ontology.

3 The term type is used here in the technical sense associated with atyped or many-sorted logic (e.g., Enderton, 1972). Types restrict quantifi-ers and the application of predicates, with each predicate being applicableonly to entities of particular types.

4 Defining the rules based purely on type results in simpler theories.More generally, we could allow predicates to play a role in determiningwhether causal relationships are plausible. In fact, this is done implicitlyeven when only type is used, since a typed logic can be reduced to standardpropositional logic by introducing predicates that indicate type (e.g.,Enderton, 1972). Pursuing this strategy requires distinguishing betweenpredicates that participate in causal relationships and predicates that areused just to determine the plausibility of those relationships. The former areused to generate the variables of the causal graphical models, whereas thelatter define the prior probability of each model (see Griffiths & Tenen-baum, 2007b, for an example of this).

Figure 2. Theory for causal induction from contingency data in a medicalsetting.


Generate the complete set of grounded predicates forthese entities. This is the set of variables that form thenodes of the graph.

2. Generate structure. Sample links between nodes usingthe probabilistic procedure stated in the Plausible rela-tions component of the theory.

3. Generate parameterization. For each node, sample aparameterization as specified in the Functional formcomponent of the theory, including the values of therelevant parameters.

This generative process defines a hypothesis space of fully specifiedcausal graphical models, together with a prior probability and, via theparameterization, a likelihood for each model in that space.

We illustrate how this generative process works by using thetheory given in Figure 2. We assume that the number of chemicals,genes, and mice involved in a particular experiment is known, andimplicitly condition on this information. For example, we mighthave a single chemical c, a single gene g, and NM micem1, . . . , mNM. The set of grounded predicates is constructed bysubstituting all possible entities for the arguments of each predi-cate in the ontology. In our case, this set consists of NM statementsindicating whether Injected(c,mi) holds of mouse mi, and NMstatements indicating whether Expressed(g,mi) holds of mousemi. We then have to consider possible causal structures on these2NM variables. Since the constraints on plausible relations are suchthat if Injected(c,mi) causes Expressed(g,mi) for some mouse mi,then it does so for all mice, we will simply draw an arrow betweentwo variables C and E to indicate that the relationship betweenInjected(c,M) and Expressed(g,M) holds for all mice M. The con-straints on plausible relations imply that the only possible causalrelationship in this graphical model is that from Injected(c,mi) toExpressed(g,mi), and that this relationship holds with probabilityp. The hypothesis space H thus consists of two causal structures:one in which Injected(c,M) causes Expressed(g,M), which hasprior probability p, and one in which Injected(c,M) does not causeExpressed(g,M), which has prior probability 1 – p. These areGraph 1 and Graph 0, respectively, shown at the top of Figure 3.We obtain our full hypothesis space of causal graphical models byaugmenting these structures with parameters w0 and w1 drawnfrom the appropriate prior distribution.

The same procedure can be used to generate a hypothesis spaceof causal graphical models for any number of entities, as illustratedin Figure 3. For example, with two chemicals and two genes, thehypothesis space contains 16 causal graphical models, with theprior probabilities determined by the number of causal relation-ships expressed in the graph. The same causal theory can be usedto define a hypothesis space for five chemicals and 10 genes, or 50chemicals and a thousand genes, simply by applying the sameabstract principles.

Summary

In this section we have outlined some of the key components ofcausal theories and explained how they can be used to generate

hypothesis spaces of causal graphical models. In the followingsections, we present a series of case studies illustrating how thesecomponents vary in different settings and how the constraints thatthey provide are essential to understanding how people learn aboutcausal relationships. These case studies help to illustrate how ourtheory-based approach can provide a unifying account of a broadrange of phenomena related to causal induction: The basic frame-work of statistical inference informed by causal theories remainsconstant, being a domain-general strategy, but the knowledgecontained within the theories varies, allowing us to explaindomain-specific inferences. The influence of the relevant priorknowledge increases as we go through the examples, starting withcausal induction from contingency data where constraints fromprior knowledge are typically weak, but can nonetheless haveinteresting consequences.

Causal Induction From Contingency Data

The aspect of human causal induction that has been studied mostextensively by psychologists is that of inferring a single causal rela-tionship from contingency data. Given information about the fre-quencies with which C and E co-occur, as summarized in Table 1,people are asked to assess the extent to which C causes E. Anumber of mathematical models have been proposed to explainhow people use contingency data to evaluate causal relationships(e.g., Allan, 1980; Anderson, 1990; Anderson & Sheu, 1995;Cheng, 1997; Cheng & Novick, 1990, 1992; Jenkins & Ward,1965; López et al., 1998; Shanks, 1995b). These models tend todownplay the role of prior knowledge, assuming that such knowl-edge serves to provide a set of candidate causes, but contingencydata are used to evaluate those causes.

We provide an account of human causal induction from contin-gency data within our theory-based framework. Most experimentsusing contingency data select candidate causes and effects forwhich causal relationships are plausible. This uniformity of plau-sibility underlies claims about the separability of causal inductionand prior knowledge and means that the aspect of causal theoriesthat determines the plausibility of relationships will not be asrelevant here as in other settings discussed later in the article.However, as our framework emphasizes, prior knowledge is notrestricted to plausibility: It also determines assumptions aboutfunctional form. Our framework thus makes two claims aboutcausal learning from contingency data: that variation in the as-sumed functional form should produce variation in human judg-ments and that causal inferences can be understood as Bayesianinferences about the causal models most likely to have generatedthe observed data.

The plan of this section is as follows. First, we analyze theproblem of causal induction from contingency data using ourtheory-based framework. We then compare the results of thisanalysis with four rational models of human judgments, using thiscomparison to highlight the assumptions behind our analysis thatare critical for predicting human judgments. We go on to test aprediction produced by this analysis, examining whether differenttasks result in different treatment of the statistical problem under-lying causal induction. Finally, we discuss how approaches for-mulating the problem of causal induction in terms o

Documents

Theory-Based Causal Induction - Princeton University · 2009. 11. 6. · In 1695, Sir Edmond Halley was computing the orbits of a set of comets for inclusion in Newton s Principia