We argue that current discussions of criteria for actual causation are ill-posed in several respects. (1) The methodology of current discussions is by induction from intuitions about an infinitesimal fraction of the possible examples and counterexamples; (2) cases with larger numbers of causes generate novel puzzles; (3) "neuron" and causal Bayes net diagrams are, as deployed in discussions of actual causation, almost always ambiguous; (4) actual causation is (intuitively) relative to an initial system state since state changes are relevant, but (...) most current accounts ignore state changes through time; (5) more generally, there is no reason to think that philosophical judgements about these sorts of cases are normative; but (6) there is a dearth of relevant psychological research that bears on whether various philosophical accounts are descriptive. Our skepticism is not directed towards the possibility of a correct account of actual causation; rather, we argue that standard methods will not lead to such an account. A different approach is required. (shrink)
Haack, S. Is truth flat or bumpy?--Chihara, C. S. Ramsey's theory of types.--Loar, B. Ramsey's theory of belief and truth.--Skorupski, J. Ramsey on Belief.--Hookway, C. Inference, partial belief, and psychological laws.--Skyrms, B. Higher order degrees of belief.--Mellor, D. H. Consciousness and degrees of belief.--Blackburn, S. Opinions and chances.--Grandy, R. E. Ramsey, reliability, and knowledge.--Cohen, L. J. The problem of natural laws.--Giedymin, J. Hamilton's method in geometrical optics and Ramsey's view of theories.
The literature on causal discovery has focused on interventions that involve randomly assigning values to a single variable. But such a randomized intervention is not the only possibility, nor is it always optimal. In some cases it is impossible or it would be unethical to perform such an intervention. We provide an account of ‘hard' and ‘soft' interventions and discuss what they can contribute to causal discovery. We also describe how the choice of the optimal intervention(s) depends heavily on the (...) particular experimental setup and the assumptions that can be made. ‡The first author is funded by the Causal Learning Collaborative Initiative supported by the James S. McDonnell Foundation. Many aspects of this paper were inspired by discussions with members of the collaborative. †To contact the authors, please write to: Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA 15213; e-mail: email@example.com and firstname.lastname@example.org. (shrink)
Over the last two decades, philosophers, statisticians, and computer scientists have converged on the fundamental outline of a theory of causal representation and causal inference (Spirtes, Glymour, and Scheines, 2000; Pearl, 2000). Some conditions and assumptions under which reliable inference about the effects of manipulations is possible have been precisely characterized; other conditions and assumptions under which reliable inference about the effects of manipulation is impossible have also been characterized. However, the theory of inference about the effects of manipulations (...) that has been developed does not consider the problem of “defined variables”. In causal modeling, sometimes variables are deliberately introduced as defined functions of others variables. More interestingly, sometimes two or more measured variables are deterministic functions of one another, not deliberately, but because of redundant measurements. In these cases, manipulation of an observed defined variable may actually be an ambiguous description of a manipulation of some underlying variables, although the manipulator does not know that this is the case. In this article we revisit the question of precisely characterizing conditions and assumption under which reliable inference about the effects of manipulations is possible, even when the possibility of “ambiguous manipulations” is allowed. (shrink)
In Causation, Prediction, and Search (Spirtes, Glymour, and Scheines 1993), we undertook a three part project. (Henceforth we will refer to Causation, Prediction, and Search as CPS.) First, we characterized when causal models are indistinguishable by population conditional independence relations under several different assumptions relating causality to probability. Second, we proposed a number of algorithms that take sample data and optional background knowledge as input, and output a class of causal models compatible with the data and the background knowledge; (...) the algorithms (with the exception of the heuristic algorithm described in Chapter 11) were accompanied by proofs of their correctness given assumptions that were clearly stated in CPS, and that we will restate below. Finally, we offered a theory of how to predict the effects of interventions in causal structures, given only partial knowledge of causal structure. Freedman's objections are all directed against the causal inference algorithms we proposed. We do not have room here to discuss all of his criticisms, but we have answered his major points. With regard to the points we do not have room to discuss, the reader should be warned that Freedman is an unreliable interpreter of what we have written. For convenience, we have divided Freedman's objections into the following categories. 1.) Freedman questions some of the assumptions on which our correctness theorems are based. Some of his criticisms are based on covariance matrices that he constructed. None of the examples he constructed in sections 11.2, 11.3, or 12.3 are counterexamples to any theorem that we stated, nor are they even germane to the question of how probable are the assumptions we make. His examples only illustrate points discussed in detail in our book (particularly in the chapter on indistinguishability), in which we give similar examples. 2.) The most serious charge that Freedman makes is that the algorithms do not compute what we say they do.. (shrink)
By bootstrapping the output of the PC algorithm (Spirtes et al., 2000; Meek 1995), using larger conditioning sets informed by the current graph state, it is possible to define a novel algorithm, JPC, that improves accuracy of search for i.i.d. data drawn from linear, Gaussian, sparse to moderately dense models. The motivation for constructing sepsets using information in the current graph state is to highlight the differences between d-‐separation information in the graph and conditional independence information extracted from the sample. (...) The same idea can be pursued for any algorithm for which conditioning sets informed by the current graph state can be constructed and for which an orientation procedure capable of orienting undirected graphs can be extracted. Another plausible candidate for such retrofitting is the CPC algorithm (Ramsey et al, 2006), yielding an algorithm, JCPC, which, when the true graph is sparse is somewhat more accurate than JPC. The method is not feasible for discovery for models of categorical variables, i.e., traditional Bayes nets; with alternative tests for conditional independence it may extend to non-‐linear or non-‐Gaussian models, or both. (shrink)
Many philosophers have worried about what philosophy is. Often they have looked for answers by considering what it is that philosophers do. Given the diversity of topics and methods found in philosophy, however, we propose a different approach. In this article we consider the philosophical temperament, asking an alternative question: What are philosophers like? Our answer is that one important aspect of the philosophical temperament is that philosophers are especially reflective. This claim is supported by a study of more than (...) 5,000 philosophers and non-philosophers, the results of which indicate that even when we control for overall education level, philosophers tend to be significantly more reflective than their peers. We then illustrate this tendency by considering what we know about the philosophizing of a few prominent philosophers. Recognizing this aspect of the philosophical temperament, it is natural to wonder how philosophers came to be this way: Does philosophical training teach reflectivity or do more reflective people tend to gravitate to philosophy? We consider the limitations of our data with respect to this question and suggest that a longitudinal study be conducted. (shrink)
In Causation, Prediction, and Search (CPS hereafter), Peter Spirtes, Clark Glymour and I developed a theory of statistical causal inference. In his presentation at the Notre Dame conference (and in his paper, this volume), Glymour discussed the assumptions on which this theory is built, traced some of the mathematical consequences of the assumptions, and pointed to situations in which the assumptions might fail. Nevertheless, many at Notre Dame found the theory difficult to understand and/or assess. As a result I was (...) asked to write this paper to provide a more intuitive introduction to the theory. In what follows I shun almost all formality and avoid the numerous and complicated qualifiers that typically accompany definitions or important philosophical concepts. They can be all be found in Glymour's paper or in CPS, which are clear although sometimes dense. Here I attempt to fix intuitions by highlighting a few of the essential ideas and by providing extremely simple examples throughout. (shrink)
Drawing substantive conclusions from linear causal models that perform acceptably on statistical tests is unreasonable if it is not known how alternatives fare on these same tests. We describe a computer program, TETRAD, that helps to search rapidly for plausible alternatives to a given causal structure. The program is based on principles from statistics, graph theory, philosophy of science, and artificial intelligence. We describe these principles, discuss how TETRAD employs them, and argue that these principles make TETRAD an effective tool. (...) Finally, we illustrate TETRAD's effectiveness by applying it to a multiple indicator model of Political and Industrial development. A pilot version of the TETRAD program is described in this paper. The current version is described in our forthcoming Discovering Causal Structure: Artificial Intelligence for Statistical Modeling. (shrink)
Practically, causation matters. Juries must decide, for example, whether a pregnant mother’s refusal to give birth by caesarean section was the cause of one of her twins death. Policy makers must decide whether violence on TV causes violence in life. Neither question can be coherently debated without some theory of causation. Fortunately (or not, depending on where one sits), a virtual plethora of theories of causation have been championed in the third of a century between 1970 and 2004.
Many philosophers of science have argued that a set of evidence that is "coherent" confirms a hypothesis which explains such coherence. In this paper, we examine the relationships between probabilistic models of all three of these concepts: coherence, confirmation, and explanation. For coherence, we consider Shogenji's measure of association (deviation from independence). For confirmation, we consider several measures in the literature, and for explanation, we turn to Causal Bayes Nets and resort to causal structure and its constraint on probability. All (...) else equal, we show that focused correlation, which is the ratio of the coherence of evidence and the coherence of the evidence conditional on a hypothesis, tracks confirmation. We then show that the causal structure of the evidence and hypothesis can put strong constraints on how coherence in the evidence does or does not translate into confirmation of the hypothesis. (shrink)
Coherentism maintains that coherent beliefs are more likely to be true than incoherent beliefs, and that coherent evidence provides more confirmation of a hypothesis when the evidence is made coherent by the explanation provided by that hypothesis. Although probabilistic models of credence ought to be well-suited to justifying such claims, negative results from Bayesian epistemology have suggested otherwise. In this essay we argue that the connection between coherence and confirmation should be understood as a relation mediated by the causal relationships (...) among the evidence and a hypothesis, and we offer a framework for doing so by fitting together probabilistic models of coherence, confirmation, and causation. We show that the causal structure among the evidence and hypothesis is sometimes enough to determine whether the coherence of the evidence boosts confirmation of the hypothesis, makes no difference to it, or even reduces it. We also show that, ceteris paribus, it is not the coherence of the evidence that boosts confirmation, but rather the ratio of the coherence of the evidence to the coherence of the evidence conditional on a hypothesis. (shrink)
Deciding matters of legal liability, in torts and other civil actions, requires deciding causation. The injury suffered by a plaintiff must be caused by an event or condition due to the defendant. The courts distinguish between cause-in-fact and proximate causation, where cause-in-fact is determined by the “but-for” test: the effect would not have happened, “but for” the cause.1 Proximate causation is a set of legal limitations on cause-in-fact.
Linear structural equation models (SEMs) are widely used in sociology, econometrics, biology, and other sciences. A SEM (without free parameters) has two parts: a probability distribution (in the Normal case specified by a set of linear structural equations and a covariance matrix among the “error” or “disturbance” terms), and an associated path diagram corresponding to the functional composition of variables specified by the structural equations and the correlations among the error terms. It is often thought that the path diagram is (...) nothing more than a heuristic device for illustrating the assumptions of the model. However, in this paper, we will show how path diagrams can be used to solve a number of important problems in structural equation modelling. (shrink)
The statistical community has brought logical rigor and mathematical precision to the problem of using data to make inferences about a model’s parameter values. The TETRAD project, and related work in computer science and statistics, aims to apply those standards to the problem of using data and background knowledge to make inferences about a model’s specification. We begin by drawing the analogy between parameter estimation and model specification search. We then describe how the specification of a structural equation model entails (...) familiar constraints on the covariance matrix for all admissible values of its parameters; we survey results on the equivalence of structural equation models, and we discuss search strategies for model specification. We end by presenting several algorithms that are implemented in the TETRAD II program. (shrink)
nature of modern data collection and storage techniques, and the increases in the speed and storage capacities of computers. Statistics books from 30 years ago often presented examples with fewer than 10 variables, in domains where some background knowledge was plausible. In contrast, in new domains, such as climate research where satellite data now provide daily quantities of data unthinkable a few decades ago, fMRI brain imaging, and microarray measurements of gene expression, the number of variables can range into the (...) tens of thousands, and there is often limited background knowledge to reduce the space of alternative causal hypotheses. In such domains, non-automated causal discovery techniques appear to be hopeless, while the availability of faster computers with larger memories and disc space allow for the practical implementation of computationally intensive automated search algorithms over large search spaces. Contemporary science is not your grandfather’s science, or Karl Popper’s. Causal inference without experimental controls has long seemed as if it must somehow be capable of being cast as a kind of statistical inference involving estimators with some kind of convergence and accuracy properties under some kind of assumptions. Until recently, the statistical literature said not. While parameter estimation and experimental design for the effective use of data developed throughout the 20th century, as recently as 20 years ago the methodology of causal inference without experimental controls remained relatively primitive. Besides a cessation of hostilities from the majority of the statistical and philosophical communities (which has still only partially happened), several things were needed for theories of causal estimation to appear and to flower: well defined mathematical objects to represent causal relations; well defined connections between aspects of these objects and sample data; and a way to compute those connections. A sequence of studies beginning with Dempster’s work on the factorization of probability distributions [Dempster 1972] and culminating with Kiiveri and Speed’s [Kiiveri & Speed 1982] study of linear structural equation models, provided the first, in the form of directed acyclic graphs, and the second, in the form of the “local” Markov condition.. (shrink)
Data analysis that merely fits an empirical covariance matrix or that finds the best least squares linear estimator of a variable is not of itself a reliable guide to judgements about policy, which inevitably involve causal conclusions. The policy implications of empirical data can be completely reversed by alternative hypotheses about the causal relations of variables, and the estimates of a particular causal influence can be radically altered by changes in the assumptions made about other dependencies.2 For these reasons, one (...) of the common aims of empirical research in the.. (shrink)
We describe anytime search procedures that (1) find disjoint subsets of recorded variables for which the members of each subset are d-separated by a single common unrecorded cause, if such exists; (2) return information about the causal relations among the latent factors so identified. We prove the procedure is point-wise consistent assuming (a) the causal relations can be represented by a directed acyclic graph (DAG) satisfying the Markov Assumption and the Faithfulness Assumption; (b) unrecorded variables are not caused by recorded (...) variables; and (c) dependencies are linear. We compare the procedure with standard approaches over a variety of simulated structures and sample sizes, and illustrate its practical value with brief studies of social science data sets. Finally, we consider generalizations for non-linear systems. Keywords: latent variable models, causality, graphical models.. (shrink)
The Gibbs sampler can be used to obtain samples of arbitrary size from the posterior distribution over the parameters of a structural equation model (SEM) given covariance data and a prior distribution over the parameters. Point estimates, standard deviations and interval estimates for the parameters can be computed from these samples. If the prior distribution over the parameters is uninformative, the posterior is proportional to the likelihood, and asymptotically the inferences based on the Gibbs sample are the same as those (...) based on the maximum likelihood solution, e.g., output from LISREL or EQS. In small samples, however, the likelihood surface is not Gaussian and in some cases contains local maxima. Nevertheless, the Gibbs sample comes from the correct posterior distribution over the parameters regardless of the sample size and the shape of the likelihood surface. With an informative prior distribution over the parameters, the posterior can be used to make inferences about the parameters of underidentified models, as we illustrate on a simple errors-in-variables model. (shrink)
In 1982, when computers were just becoming widely available, I was a graduate student beginning my work with Clark Glymour on a PhD thesis entitled: “Causality in the Social Sciences.” Dazed and confused by the vast philosophical literature on causation, I found relative solace in the clarity of Structural Equation Models (SEMs), a form of statistical model used commonly by practicing sociologists, political scientists, etc., to model causal hypotheses with which associations among measured variables might be explained. The statistical literature (...) around SEMs was vast as well, but Clark had extracted from it a particular kind of evidential constraint first studied by Charles Spearman at the beginning of the 20th century, the “vanishing tetrad difference.”1 As it turned out, certain kinds of causal structures entailed these constraints, and others did not. Spearman used this lever to argue for the existence of a single, general intelligence factor, the infamous g (Spearman, 1904). (shrink)
The statistical evidence for the detrimental effect of exposure to low levels of lead on the cognitive capacities of children has been debated for several decades. In this paper I describe how two techniques from artificial intelligence and statistics help make the statistical evidence for the accepted epidemiological conclusion seem decisive. The first is a variable-selection routine in TETRAD III for finding causes, and the second a Bayesian estimation of the parameter reflecting the causal influence of Actual Lead Exposure, a (...) latent variable, on the measured IQ score of middle class suburban children. (shrink)
It has been shown in Spirtes(1995) that X and Y are d-separated given Z in a directed graph associated with a recursive or non-recursive linear model without correlated errors if and only if the model entails that ρXY.Z = 0. This result cannot be directly applied to a linear model with correlated errors, however, because the standard graphical representation of a linear model with correlated errors is not a directed graph. The main result of this paper is to show how (...) to associate a directed graph with a linear model L with correlated errors, and then use d-separation in the associated directed graph to determine whether L entails that a particular partial correlation is zero. (shrink)
We show that if any number of variables are allowed to be simultaneously and independently randomized in any one experiment, log2(N ) + 1 experiments are suﬃcient and in the worst case necessary to determine the causal relations among N ≥ 2 variables when no latent variables, no sample selection bias and no feedback cycles are present. For all K, 0 < K <.
There is now substantial agreement about the representational component of a normative theory of causal reasoning: Causal Bayes Nets. There is less agreement about a normative theory of causal discovery from data, either computationally or cognitively, and almost no work investigating how teaching the Causal Bayes Nets representational apparatus might help individuals faced with a causal learning task. Psychologists working to describe how naïve participants represent and learn causal structure from data have focused primarily on learning from single trials under (...) a variety of conditions. In contrast, one component of the normative theory focuses on learning from a sample drawn from a population under some experimental or observational study regime. Through a virtual Causality Lab that embodies the normative theory of causal reasoning and which allows us to record student behavior, we have begun to systematically explore how best to teach the normative theory. In this paper we explain the overall project and report on pilot studies which suggest that students can quickly be taught to (appear to) be quite rational. (shrink)
Over the last two decades, a fundamental outline of a theory of causal inference has emerged. However, this theory does not consider the following problem. Sometimes two or more measured variables are deterministic functions of one another, not deliberately, but because of redundant measurements. In these cases, manipulation of an observed defined variable may actually be an ambiguous description of a manipulation of some underlying variables, although the manipulator does not know that this is the case. In this article we (...) revisit the question of precisely characterizing conditions and assumptions under which reliable inference about the effects of manipulations is possible, even when the possibility of “ambiguous manipulations” is allowed. (shrink)
More and more, judges and juries are being asked to handle torts and other cases in which establishing liability involves understanding large bodies of complex scientific evidence. When establishing causation is involved, the evidence can be diverse, can involve complicated statistical models, and can seem impenetrable to non-experts. Since the decision in Daubert v. Merril Dow Pharms., Inc.1 in 1993, judges cannot simply admit expert testimony and other technical evidence and let jurors decide the verdict. Judges now must rule on (...) which experts are admissible and which are inadmissible, and they must base their ruling at least partly on the status of the scientific evidence about which the expert will testify.2 This article is intended to provide judges with an accessible methodological overview of causal science. (shrink)
The past two decades have seen a dramatic growth in the use of statisticians and economists for the presentation of expert testimony in legal proceedings. In this paper, we describe a hypothetical case modeled on real ones and involving statistical testimony regarding the causal effect of lead on lowering the IQs of children who ingest lead paint chips. The data we use come from a well-known pioneering study on the topic and the analyses we describe as the expert testimony are (...) similar to ones that can be found in major scientific journals. The battle of the experts in this hypothetical case resembles that which many encounter as expert witnesses. The paper concludes with some observations and advice. (shrink)
By combining experimental interventions with search procedures for graphical causal models we show that under familiar assumptions, with perfect data, N - 1 experiments suffice to determine the causal relations among N > 2 variables when each experiment randomizes at most one variable. We show the same bound holds for adaptive learners, but does not hold for N > 4 when each experiment can simultaneously randomize more than one variable. This bound provides a type of ideal for the measure of (...) success of heuristic approached in active learning methods of casual discovery, which currently use less informative measures. (shrink)
Researchers routinely face the problem of inferring causal relationships from large amounts of data, sometimes involving hundreds of variables. Often, it is the causal relationships between "latent" (unmeasured) variables that are of primary interest. The problem is how causal relationships between unmeasured variables can be inferred from measured data. For example, naval manpower researchers have been asked to infer the causal relations among psychological traits such as job satisfaction and job challenge from a data base in which neither trait is (...) measured directly, but in which answers to interview questions are plausibly associated with each trait. By combining background knowledge with an algorithm that searches for causal structure among the unobserved variables, we have created a tool that can reliably extract useful causal information about latent variables from large data bases. In what follows we describe the class of causal models to which our.. (shrink)
Memory sometimes yields knowledge and sometimes does not. It is, however, natural to suppose that i f a man remembers that p, then he knows that p and formerly knew that p. Remembering something is plausibly construed as a f o rm of knowing something which one has not forgotten and which one knew previously. We argue, to the contrary, that this thesis is false. We present four counterexamples to the thesis that support a different analysis of remembering. We propose (...) that a person remembers that p (at t) if and only if the thought or conviction that p comes from memory (at t) when, in fact, it is true that p. (shrink)
Reflectance spectroscopy is a standard tool for studying the mineral composition of rock and soil samples and for remote sensing of terrestrial and extraterrestrial surfaces. We describe research on automated methods of mineral identification from reflectance spectra and give evidence that a simple algorithm, adapted from a well-known search procedure for Bayes nets, identifies the most frequently occurring classes of carbonates with reliability equal to or greater than that of human experts. We compare the reliability of the procedure to the (...) reliability of several other automated methods adapted to the same purpose. Evidence is given that the procedure can be applied to some other mineral classes as well. Since the procedure is fast with low memory requirements, it is suitable for on- board scientific analysis by orbiters or surface rovers. (shrink)