PhilPapers is currently in read-only mode while we are performing some maintenance. You can use the site normally except that you cannot sign in. This shouldn't last long.
We show that if any number of variables are allowed to be simultaneously and independently randomized in any one experiment, log2(N ) + 1 experiments are sufficient and in the worst case necessary to determine the causal relations among N ≥ 2 variables when no latent variables, no sample selection bias and no feedback cycles are present. For all K, 0 < K <.
We present an algorithm to infer causal relations between a set of measured variables on the basis of experiments on these variables. The algorithm assumes that the causal relations are linear, but is otherwise completely general: It provides consistent estimates when the true causal structure contains feedback loops and latent variables, while the experiments can involve surgical or ‘soft’ interventions on one or multiple variables at a time. The algorithm is ‘online’ in the sense that it combines the results from (...) any set of available experiments, can incorporate background knowledge and resolves con- flicts that arise from combining results from different experiments. In addition we provide a necessary and sufficient condition that (i) determines when the algorithm can uniquely return the true graph, and (ii) can be used to select the next best experiment until this condition is satisfied. We demonstrate the method by applying it to simulated data and the flow cytometry data of Sachs et al (2005). (shrink)
By combining experimental interventions with search procedures for graphical causal models we show that under familiar assumptions, with perfect data, N − 1 experiments suffice to determine the causal relations among N > 2 variables when each experiment randomizes at most one variable. We show the same bound holds for adaptive learners, but does not hold for N > 4 when each experiment can simultaneously randomize more than one variable. This bound provides a type of ideal for the measure of (...) success of heuristic approaches in active learning methods of causal discovery, which currently use less informative measures. (shrink)
In Causation, Prediction, and Search (Spirtes, Glymour, and Scheines 1993), we undertook a three part project. (Henceforth we will refer to Causation, Prediction, and Search as CPS.) First, we characterized when causal models are indistinguishable by population conditional independence relations under several different assumptions relating causality to probability. Second, we proposed a number of algorithms that take sample data and optional background knowledge as input, and output a class of causal models compatible with the data and the background knowledge; the (...) algorithms (with the exception of the heuristic algorithm described in Chapter 11) were accompanied by proofs of their correctness given assumptions that were clearly stated in CPS, and that we will restate below. Finally, we offered a theory of how to predict the effects of interventions in causal structures, given only partial knowledge of causal structure. Freedman's objections are all directed against the causal inference algorithms we proposed. We do not have room here to discuss all of his criticisms, but we have answered his major points. With regard to the points we do not have room to discuss, the reader should be warned that Freedman is an unreliable interpreter of what we have written. For convenience, we have divided Freedman's objections into the following categories. 1.) Freedman questions some of the assumptions on which our correctness theorems are based. Some of his criticisms are based on covariance matrices that he constructed. None of the examples he constructed in sections 11.2, 11.3, or 12.3 are counterexamples to any theorem that we stated, nor are they even germane to the question of how probable are the assumptions we make. His examples only illustrate points discussed in detail in our book (particularly in the chapter on indistinguishability), in which we give similar examples. 2.) The most serious charge that Freedman makes is that the algorithms do not compute what we say they do.. (shrink)
There is now substantial agreement about the representational component of a normative theory of causal reasoning: Causal Bayes Nets. There is less agreement about a normative theory of causal discovery from data, either computationally or cognitively, and almost no work investigating how teaching the Causal Bayes Nets representational apparatus might help individuals faced with a causal learning task. Psychologists working to describe how naïve participants represent and learn causal structure from data have focused primarily on learning from single trials under (...) a variety of conditions. In contrast, one component of the normative theory focuses on learning from a sample drawn from a population under some experimental or observational study regime. Through a virtual Causality Lab that embodies the normative theory of causal reasoning and which allows us to record student behavior, we have begun to systematically explore how best to teach the normative theory. In this paper we explain the overall project and report on pilot studies which suggest that students can quickly be taught to (appear to) be quite rational. (shrink)
Linear structural equation models with latent (unmeasured) variables are used widely in sociology, psychometrics, and political science. When such models have a unidimensional..
S There is a long tradition of representing causal relationships by directed acyclic graphs (Wright, 1934 ). Spirtes ( 1994), Spirtes et al. ( 1993) and Pearl & Verma ( 1991) describe procedures for inferring the presence or absence of causal arrows in the graph even if there might be unobserved confounding variables, and/or an unknown time order, and that under weak conditions, for certain combinations of directed acyclic graphs and probability distributions, are asymptotically, in sample size, consistent. These results (...) are surprising since they seem to contradict the standard statistical wisdom that consistent estimators of causal effects do not exist for nonrandomised studies if there are potentially unobserved confounding variables. We resolve the apparent incompatibility of these views by closely examining the asymptotic properties of these causal inference procedures. We show that the asymptotically consistent procedures are ‘pointwise consistent’, but ‘uniformly consistent’ tests do not exist. Thus, no finite sample size can ever be guaranteed to approximate the asymptotic results. We also show the nonexistence of valid, consistent confidence intervals for causal effects and the nonexistence of uniformly consistent point estimators. Our results make no assumption about the form of the tests or estimators. In particular, the tests could be classical independence tests, they could be Bayes tests or they could be tests based on scoring methods such as or . The implications of our results for observational studies are controversial and are discussed briefly in the last section of the paper. The results hinge on the following fact: it is possible to find, for each sample size n, distributions P and Q such that P and Q are empirically indistinguishable and yet P and Q correspond to different causal effects. (shrink)
Researchers routinely face the problem of inferring causal relationships from large amounts of data, sometimes involving hundreds of variables. Often, it is the causal relationships between "latent" (unmeasured) variables that are of primary interest. The problem is how causal relationships between unmeasured variables can be inferred from measured data. For example, naval manpower researchers have been asked to infer the causal relations among psychological traits such as job satisfaction and job challenge from a data base in which neither trait is (...) measured directly, but in which answers to interview questions are plausibly associated with each trait. By combining background knowledge with an algorithm that searches for causal structure among the unobserved variables, we have created a tool that can reliably extract useful causal information about latent variables from large data bases. In what follows we describe the class of causal models to which our.. (shrink)
The Carnegie Mellon Proof Tutor project was motivated by pedagogical concerns: we wanted to use a "mechanical" (i.e. computerized) tutor for teaching students..
Data analysis that merely fits an empirical covariance matrix or that finds the best least squares linear estimator of a variable is not of itself a reliable guide to judgements about policy, which inevitably involve causal conclusions. The policy implications of empirical data can be completely reversed by alternative hypotheses about the causal relations of variables, and the estimates of a particular causal influence can be radically altered by changes in the assumptions made about other dependencies.2 For these reasons, one (...) of the common aims of empirical research in the.. (shrink)
Linear structural equation models (SEMs) are widely used in sociology, econometrics, biology, and other sciences. A SEM (without free parameters) has two parts: a probability distribution (in the Normal case specified by a set of linear structural equations and a covariance matrix among the “error” or “disturbance” terms), and an associated path diagram corresponding to the functional composition of variables specified by the structural equations and the correlations among the error terms. It is often thought that the path diagram is (...) nothing more than a heuristic device for illustrating the assumptions of the model. However, in this paper, we will show how path diagrams can be used to solve a number of important problems in structural equation modelling. (shrink)
It has been shown in Spirtes(1995) that X and Y are d-separated given Z in a directed graph associated with a recursive or non-recursive linear model without correlated errors if and only if the model entails that ρXY.Z = 0. This result cannot be directly applied to a linear model with correlated errors, however, because the standard graphical representation of a linear model with correlated errors is not a directed graph. The main result of this paper is to show how (...) to associate a directed graph with a linear model L with correlated errors, and then use d-separation in the associated directed graph to determine whether L entails that a particular partial correlation is zero. (shrink)
In Causation, Prediction, and Search (CPS hereafter), Peter Spirtes, Clark Glymour and I developed a theory of statistical causal inference. In his presentation at the Notre Dame conference (and in his paper, this volume), Glymour discussed the assumptions on which this theory is built, traced some of the mathematical consequences of the assumptions, and pointed to situations in which the assumptions might fail. Nevertheless, many at Notre Dame found the theory difficult to understand and/or assess. As a result I was (...) asked to write this paper to provide a more intuitive introduction to the theory. In what follows I shun almost all formality and avoid the numerous and complicated qualifiers that typically accompany definitions or important philosophical concepts. They can be all be found in Glymour's paper or in CPS, which are clear although sometimes dense. Here I attempt to fix intuitions by highlighting a few of the essential ideas and by providing extremely simple examples throughout. (shrink)
Students can use an educational system’s help in unexpected ways. For example, they may bypass abstract hints in search of a concrete solution. This behavior has traditionally been labeled as a form of gaming or help abuse. We propose that some examples of this behavior are not abusive and that bottom-out hints can act as worked examples. We create a model for distinguishing good student use of bottom-out hints from bad student use of bottom-out hints by means of logged response (...) times. We show that this model not only predicts learning, but captures behaviors related to self-explanation. (shrink)
The Gibbs sampler can be used to obtain samples of arbitrary size from the posterior distribution over the parameters of a structural equation model (SEM) given covariance data and a prior distribution over the parameters. Point estimates, standard deviations and interval estimates for the parameters can be computed from these samples. If the prior distribution over the parameters is uninformative, the posterior is proportional to the likelihood, and asymptotically the inferences based on the Gibbs sample are the same as those (...) based on the maximum likelihood solution, e.g., output from LISREL or EQS. In small samples, however, the likelihood surface is not Gaussian and in some cases contains local maxima. Nevertheless, the Gibbs sample comes from the correct posterior distribution over the parameters regardless of the sample size and the shape of the likelihood surface. With an informative prior distribution over the parameters, the posterior can be used to make inferences about the parameters of underidentified models, as we illustrate on a simple errors-in-variables model. (shrink)
Practically, causation matters. Juries must decide, for example, whether a pregnant mother’s refusal to give birth by caesarean section was the cause of one of her twins death. Policy makers must decide whether violence on TV causes violence in life. Neither question can be coherently debated without some theory of causation. Fortunately (or not, depending on where one sits), a virtual plethora of theories of causation have been championed in the third of a century between 1970 and 2004.
Over the last two decades, philosophers, statisticians, and computer scientists have converged on the fundamental outline of a theory of causal representation and causal inference (Spirtes, Glymour, and Scheines, 2000; Pearl, 2000). Some conditions and assumptions under which reliable inference about the effects of manipulations is possible have been precisely characterized; other conditions and assumptions under which reliable inference about the effects of manipulation is impossible have also been characterized. However, the theory of inference about the effects of manipulations that (...) has been developed does not consider the problem of “defined variables”. In causal modeling, sometimes variables are deliberately introduced as defined functions of others variables. More interestingly, sometimes two or more measured variables are deterministic functions of one another, not deliberately, but because of redundant measurements. In these cases, manipulation of an observed defined variable may actually be an ambiguous description of a manipulation of some underlying variables, although the manipulator does not know that this is the case. In this article we revisit the question of precisely characterizing conditions and assumption under which reliable inference about the effects of manipulations is possible, even when the possibility of “ambiguous manipulations” is allowed. (shrink)
More and more, judges and juries are being asked to handle torts and other cases in which establishing liability involves understanding large bodies of complex scientific evidence. When establishing causation is involved, the evidence can be diverse, can involve complicated statistical models, and can seem impenetrable to non-experts. Since the decision in Daubert v. Merril Dow Pharms., Inc.1 in 1993, judges cannot simply admit expert testimony and other technical evidence and let jurors decide the verdict. Judges now must rule on (...) which experts are admissible and which are inadmissible, and they must base their ruling at least partly on the status of the scientific evidence about which the expert will testify.2 This article is intended to provide judges with an accessible methodological overview of causal science. (shrink)
Deciding matters of legal liability, in torts and other civil actions, requires deciding causation. The injury suffered by a plaintiff must be caused by an event or condition due to the defendant. The courts distinguish between cause-in-fact and proximate causation, where cause-in-fact is determined by the “but-for” test: the effect would not have happened, “but for” the cause.1 Proximate causation is a set of legal limitations on cause-in-fact.
The statistical evidence for the detrimental effect of exposure to low levels of lead on the cognitive capacities of children has been debated for several decades. In this paper I describe how two techniques from artificial intelligence and statistics help make the statistical evidence for the accepted epidemiological conclusion seem decisive. The first is a variable-selection routine in TETRAD III for finding causes, and the second a Bayesian estimation of the parameter reflecting the causal influence of Actual Lead Exposure, a (...) latent variable, on the measured IQ score of middle class suburban children. (shrink)
The past two decades have seen a dramatic growth in the use of statisticians and economists for the presentation of expert testimony in legal proceedings. In this paper, we describe a hypothetical case modeled on real ones and involving statistical testimony regarding the causal effect of lead on lowering the IQs of children who ingest lead paint chips. The data we use come from a well-known pioneering study on the topic and the analyses we describe as the expert testimony are (...) similar to ones that can be found in major scientific journals. The battle of the experts in this hypothetical case resembles that which many encounter as expert witnesses. The paper concludes with some observations and advice. (shrink)
vertices of a DAG. of K? We assume there are no unmeasured common causes of the N variables, that the system is free of feedback, and that the independence relations true of..
The statistical community has brought logical rigor and mathematical precision to the problem of using data to make inferences about a model’s parameter values. The TETRAD project, and related work in computer science and statistics, aims to apply those standards to the problem of using data and background knowledge to make inferences about a model’s specification. We begin by drawing the analogy between parameter estimation and model specification search. We then describe how the specification of a structural equation model entails (...) familiar constraints on the covariance matrix for all admissible values of its parameters; we survey results on the equivalence of structural equation models, and we discuss search strategies for model specification. We end by presenting several algorithms that are implemented in the TETRAD II program. (shrink)
We describe anytime search procedures that (1) find disjoint subsets of recorded variables for which the members of each subset are d-separated by a single common unrecorded cause, if such exists; (2) return information about the causal relations among the latent factors so identified. We prove the procedure is point-wise consistent assuming (a) the causal relations can be represented by a directed acyclic graph (DAG) satisfying the Markov Assumption and the Faithfulness Assumption; (b) unrecorded variables are not caused by recorded (...) variables; and (c) dependencies are linear. We compare the procedure with standard approaches over a variety of simulated structures and sample sizes, and illustrate its practical value with brief studies of social science data sets. Finally, we consider generalizations for non-linear systems. Keywords: latent variable models, causality, graphical models.. (shrink)
nature of modern data collection and storage techniques, and the increases in the speed and storage capacities of computers. Statistics books from 30 years ago often presented examples with fewer than 10 variables, in domains where some background knowledge was plausible. In contrast, in new domains, such as climate research where satellite data now provide daily quantities of data unthinkable a few decades ago, fMRI brain imaging, and microarray measurements of gene expression, the number of variables can range into the (...) tens of thousands, and there is often limited background knowledge to reduce the space of alternative causal hypotheses. In such domains, non-automated causal discovery techniques appear to be hopeless, while the availability of faster computers with larger memories and disc space allow for the practical implementation of computationally intensive automated search algorithms over large search spaces. Contemporary science is not your grandfather’s science, or Karl Popper’s. Causal inference without experimental controls has long seemed as if it must somehow be capable of being cast as a kind of statistical inference involving estimators with some kind of convergence and accuracy properties under some kind of assumptions. Until recently, the statistical literature said not. While parameter estimation and experimental design for the effective use of data developed throughout the 20th century, as recently as 20 years ago the methodology of causal inference without experimental controls remained relatively primitive. Besides a cessation of hostilities from the majority of the statistical and philosophical communities (which has still only partially happened), several things were needed for theories of causal estimation to appear and to flower: well defined mathematical objects to represent causal relations; well defined connections between aspects of these objects and sample data; and a way to compute those connections. A sequence of studies beginning with Dempster’s work on the factorization of probability distributions [Dempster 1972] and culminating with Kiiveri and Speed’s [Kiiveri & Speed 1982] study of linear structural equation models, provided the first, in the form of directed acyclic graphs, and the second, in the form of the “local” Markov condition.. (shrink)
We argue that current discussions of criteria for actual causation are ill-posed in several respects. (1) The methodology of current discussions is by induction from intuitions about an infinitesimal fraction of the possible examples and counterexamples; (2) cases with larger numbers of causes generate novel puzzles; (3) “neuron” and causal Bayes net diagrams are, as deployed in discussions of actual causation, almost always ambiguous; (4) actual causation is (intuitively) relative to an initial system state since state changes are relevant, but (...) most current accounts ignore state changes through time; (5) more generally, there is no reason to think that philosophical judgements about these sorts of cases are normative; but (6) there is a dearth of relevant psychological research that bears on whether various philosophical accounts are descriptive. Our skepticism is not directed towards the possibility of a correct account of actual causation; rather, we argue that standard methods will not lead to such an account. A different approach is required. Once upon a time a hungry wanderer came into a village. He filled an iron cauldron with water, built a fire under it, and dropped a stone into the water. “I do like a tasty stone soup” he announced. Soon a villager added a cabbage to the pot, another added some salt and others added potatoes, onions, carrots, mushrooms, and so on, until there was a meal for all. (shrink)
Many philosophers have worried about what philosophy is. Often they have looked for answers by considering what it is that philosophers do. Given the diversity of topics and methods found in philosophy, however, we propose a different approach. In this article we consider the philosophical temperament, asking an alternative question: what are philosophers like? Our answer is that one important aspect of the philosophical temperament is that philosophers are especially reflective: they are less likely than their peers to embrace what (...) seems obvious without questioning it. This claim is supported by a study of more than 4,000 philosophers and non-philosophers, the results of which indicate that even when we control for overall education level, philosophers tend to be significantly more reflective than their peers. We then illustrate this tendency by considering what we know about the philosophizing of a few prominent philosophers. Recognizing this aspect of the philosophical temperament, it is natural to wonder how philosophers came to be this way: does philosophical training teach reflectivity or do more reflective people tend to gravitate to philosophy? We consider the limitations of our data with respect to this question and suggest that a longitudinal study be conducted. (shrink)
Coherentism maintains that coherent beliefs are more likely to be true than incoherent beliefs, and that coherent evidence provides more confirmation of a hypothesis when the evidence is made coherent by the explanation provided by that hypothesis. Although probabilistic models of credence ought to be well-suited to justifying such claims, negative results from Bayesian epistemology have suggested otherwise. In this essay we argue that the connection between coherence and confirmation should be understood as a relation mediated by the causal relationships (...) among the evidence and a hypothesis, and we offer a framework for doing so by fitting together probabilistic models of coherence, confirmation, and causation. We show that the causal structure among the evidence and hypothesis is sometimes enough to determine whether the coherence of the evidence boosts confirmation of the hypothesis, makes no difference to it, or even reduces it. We also show that, ceteris paribus, it is not the coherence of the evidence that boosts confirmation, but rather the ratio of the coherence of the evidence to the coherence of the evidence conditional on a hypothesis. (shrink)
Many philosophers of science have argued that a set of evidence that is "coherent" confirms a hypothesis which explains such coherence. In this paper, we examine the relationships between probabilistic models of all three of these concepts: coherence, confirmation, and explanation. For coherence, we consider Shogenji's measure of association (deviation from independence). For confirmation, we consider several measures in the literature, and for explanation, we turn to Causal Bayes Nets and resort to causal structure and its constraint on probability. All (...) else equal, we show that focused correlation, which is the ratio of the coherence of evidence and the coherence of the evidence conditional on a hypothesis, tracks confirmation. We then show that the causal structure of the evidence and hypothesis can put strong constraints on how coherence in the evidence does or does not translate into confirmation of the hypothesis. (shrink)
Many philosophers have worried about what philosophy is. Often they have looked for answers by considering what it is that philosophers do. Given the diversity of topics and methods found in philosophy, however, we propose a different approach. In this article we consider the philosophical temperament, asking an alternative question: What are philosophers like? Our answer is that one important aspect of the philosophical temperament is that philosophers are especially reflective. This claim is supported by a study of more than (...) 5,000 philosophers and non-philosophers, the results of which indicate that even when we control for overall education level, philosophers tend to be significantly more reflective than their peers. We then illustrate this tendency by considering what we know about the philosophizing of a few prominent philosophers. Recognizing this aspect of the philosophical temperament, it is natural to wonder how philosophers came to be this way: Does philosophical training teach reflectivity or do more reflective people tend to gravitate to philosophy? We consider the limitations of our data with respect to this question and suggest that a longitudinal study be conducted. (shrink)
The literature on causal discovery has focused on interventions that involve randomly assigning values to a single variable. But such a randomized intervention is not the only possibility, nor is it always optimal. In some cases it is impossible or it would be unethical to perform such an intervention. We provide an account of ‘hard' and ‘soft' interventions and discuss what they can contribute to causal discovery. We also describe how the choice of the optimal intervention(s) depends heavily on the (...) particular experimental setup and the assumptions that can be made. ‡The first author is funded by the Causal Learning Collaborative Initiative supported by the James S. McDonnell Foundation. Many aspects of this paper were inspired by discussions with members of the collaborative. †To contact the authors, please write to: Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA 15213; e-mail: fde@cmu.edu and scheines@cmu.edu. (shrink)
By combining experimental interventions with search procedures for graphical causal models we show that under familiar assumptions, with perfect data, N - 1 experiments suffice to determine the causal relations among N > 2 variables when each experiment randomizes at most one variable. We show the same bound holds for adaptive learners, but does not hold for N > 4 when each experiment can simultaneously randomize more than one variable. This bound provides a type of ideal for the measure of (...) success of heuristic approached in active learning methods of casual discovery, which currently use less informative measures. (shrink)
Over the last two decades, a fundamental outline of a theory of causal inference has emerged. However, this theory does not consider the following problem. Sometimes two or more measured variables are deterministic functions of one another, not deliberately, but because of redundant measurements. In these cases, manipulation of an observed defined variable may actually be an ambiguous description of a manipulation of some underlying variables, although the manipulator does not know that this is the case. In this article we (...) revisit the question of precisely characterizing conditions and assumptions under which reliable inference about the effects of manipulations is possible, even when the possibility of “ambiguous manipulations” is allowed. (shrink)
In 1982, when computers were just becoming widely available, I was a graduate student beginning my work with Clark Glymour on a PhD thesis entitled: “Causality in the Social Sciences.” Dazed and confused by the vast philosophical literature on causation, I found relative solace in the clarity of Structural Equation Models (SEMs), a form of statistical model used commonly by practicing sociologists, political scientists, etc., to model causal hypotheses with which associations among measured variables might be explained. The statistical literature (...) around SEMs was vast as well, but Clark had extracted from it a particular kind of evidential constraint first studied by Charles Spearman at the beginning of the 20th century, the “vanishing tetrad difference.”1 As it turned out, certain kinds of causal structures entailed these constraints, and others did not. Spearman used this lever to argue for the existence of a single, general intelligence factor, the infamous g (Spearman, 1904). (shrink)
Drawing substantive conclusions from linear causal models that perform acceptably on statistical tests is unreasonable if it is not known how alternatives fare on these same tests. We describe a computer program, TETRAD, that helps to search rapidly for plausible alternatives to a given causal structure. The program is based on principles from statistics, graph theory, philosophy of science, and artificial intelligence. We describe these principles, discuss how TETRAD employs them, and argue that these principles make TETRAD an effective tool. (...) Finally, we illustrate TETRAD's effectiveness by applying it to a multiple indicator model of Political and Industrial development. A pilot version of the TETRAD program is described in this paper. The current version is described in our forthcoming Discovering Causal Structure: Artificial Intelligence for Statistical Modeling. (shrink)