Much of the recent work on the epistemology of causation has centered on two assumptions, known as the Causal Markov Condition and the Causal Faithfulness Condition. Philosophical discussions of the latter condition have exhibited situations in which it is likely to fail. This paper studies the Causal Faithfulness Condition as a conjunction of weaker conditions. We show that some of the weaker conjuncts can be empirically tested, and hence do not have to be assumed a priori. Our results lead to (...) two methodologically significant observations: (1) some common types of counterexamples to the Faithfulness condition constitute objections only to the empirically testable part of the condition; and (2) some common defenses of the Faithfulness condition do not provide justification or evidence for the testable parts of the condition. It is thus worthwhile to study the possibility of reliable causal inference under weaker Faithfulness conditions. As it turns out, the modification needed to make standard procedures work under a weaker version of the Faithfulness condition also has the practical effect of making them more robust when the standard Faithfulness condition actually holds. This, we argue, is related to the possibility of controlling error probabilities with finite sample size (“uniform consistency”) in causal inference. (shrink)
The framework of causal Bayes nets, currently influential in several scientific disciplines, provides a rich formalism to study the connection between causality and probability from an epistemological perspective. This article compares three assumptions in the literature that seem to constrain the connection between causality and probability in the style of Occam's razor. The trio includes two minimality assumptions—one formulated by Spirtes, Glymour, and Scheines (SGS) and the other due to Pearl—and the more well-known faithfulness or stability assumption. In terms of (...) logical strength, it is fairly obvious that the three form a sequence of increasingly stronger assumptions. The focus of this article, however, is to investigate the nature of their relative strength. The comparative analysis reveals an important sense in which Pearl's minimality assumption is as strong as the faithfulness assumption and identifies a useful condition under which it is as safe as SGS's relatively secure minimality assumption. Both findings have notable implications for the theory and practice of causal inference. 1 Introduction2 Background: Inference of Causal Structure in Markovian Causal Models3 Three Assumptions of Simplicity4 A Comparison of P-minimality and Faithfulness5 A Comparison of P-minimality and SGS-minimality6 Methodological Formulations and Prior Knowledge of Causal Order7 Conclusion. (shrink)
In the causal inference framework of Spirtes, Glymour, and Scheines, inferences about causal relationships are made from samples from probability distributions and a number of assumptions relating causal relations to probability distributions. The most controversial of these assumptions is the Causal Faithfulness Assumption, which roughly states that if a conditional independence statement is true of a probability distribution generated by a causal structure, it is entailed by the causal structure and not just for particular parameter values. In this paper we (...) show that the addition of the Causal Faithfulness Assumption plays three quite different roles in the SGS framework: it reduces the degree of underdetermination of causal structure by probability distribution; computationally, it justifies reliable causal inference algorithms that would otherwise have to be slower in order to be reliable; and statistically, it implies that those algorithms reliably obtain the correct answer at smaller sample sizes than would otherwise be the case. We also consider a number of variations on the Causal Faithfulness Assumption, and show how they affect each of these three roles. (shrink)
In the artificial intelligence literature a promising approach to counterfactual reasoning is to interpret counterfactual conditionals based on causal models. Different logics of such causal counterfactuals have been developed with respect to different classes of causal models. In this paper I characterize the class of causal models that are Lewisian in the sense that they validate the principles in Lewis’s well-known logic of counterfactuals. I then develop a system sound and complete with respect to this class. The resulting logic is (...) the weakest logic of causal counterfactuals that respects Lewis’s principles, sits in between the logic developed by Galles and Pearl and the logic developed by Halpern, and stands to Galles and Pearl’s logic in the same fashion as Lewis’s stands to Stalnaker’s. (shrink)
Most causal discovery algorithms in the literature exploit an assumption usually referred to as the Causal Faithfulness or Stability Condition. In this paper, we highlight two components of the condition used in constraint-based algorithms, which we call “Adjacency-Faithfulness” and “Orientation- Faithfulness.” We point out that assuming Adjacency-Faithfulness is true, it is possible to test the validity of Orientation- Faithfulness. Motivated by this observation, we explore the consequence of making only the Adjacency-Faithfulness assumption. We show that the familiar PC algorithm has (...) to be modified to be correct under the weaker, Adjacency-Faithfulness assumption. The modified algorithm, called Conservative PC (CPC), checks whether Orientation- Faithfulness holds in the orientation phase, and if not, avoids drawing certain causal conclusions the PC algorithm would draw. Howtion: ever, if the stronger, standard causal Faith-. (shrink)
Spirtes, Glymour and Scheines [Causation, Prediction, and Search Springer] described a pointwise consistent estimator of the Markov equivalence class of any causal structure that can be represented by a directed acyclic graph for any parametric family with a uniformly consistent test of conditional independence, under the Causal Markov and Causal Faithfulness assumptions. Robins et al. [Biometrika 90 491–515], however, proved that there are no uniformly consistent estimators of Markov equivalence classes of causal structures under those assumptions. Subsequently, Kalisch and B¨uhlmann (...) [J. Mach. Learn. Res. 8 613–636] described a uniformly consistent estimator of the Markov equivalence class of a linear Gaussian causal structure under the Causal Markov and Strong Causal Faithfulness assumptions. However, the Strong Faithfulness assumption may be false with high probability in many domains. We describe a uniformly consistent estimator of both the Markov equivalence class of a linear Gaussian causal structure and the identifiable structural coefficients in the Markov equivalence class under the Causal Markov assumption and the considerably weaker k-Triangle-Faithfulness assumption. (shrink)
A fundamental question in causal inference is whether it is possible to reliably infer the manipulation effects from observational data. There are a variety of senses of asymptotic reliability in the statistical literature, among which the most commonly discussed frequentist notions are pointwise consistency and uniform consistency (see, e.g. Bickel, Doksum [2001]). Uniform consistency is in general preferred to pointwise consistency because the former allows us to control the worst case error bounds with a finite sample size. In the sense (...) of pointwise consistency, several reliable causal inference algorithms have been established under the Markov and Faithfulness assumptions [Pearl 2000, Spirtes et al. 2001]. In the sense of uniform consistency, however, reliable causal inference is impossible under the two assumptions when time order is unknown and/or latent confounders are present [Robins et al. 2000]. In this paper we present two natural generalizations of the Faithfulness assumption in the context of structural equation models, under which we show that the typical algorithms in the literature are uniformly consistent with or without modifications even when the time order is unknown. We also discuss the situation where latent confounders may be present and the sense in which the Faithfulness assumption is a limiting case of the stronger assumptions. (shrink)
Causal discovery becomes especially challenging when the possibility of latent confounding and/or selection bias is not assumed away. For this task, ancestral graph models are particularly useful in that they can represent the presence of latent confounding and selection effect, without explicitly invoking unobserved variables. Based on the machinery of ancestral graphs, there is a provably sound causal discovery algorithm, known as the FCI algorithm, that allows the possibility of latent confounders and selection bias. However, the orientation rules used in (...) the algorithm are not complete. In this paper, we provide additional orientation rules, augmented by which the FCI algorithm is shown to be complete, in the sense that it can, under standard assumptions, discover all aspects of the causal structure that are uniquely determined by facts of probabilistic dependence and independence. The result is useful for developing any causal discovery and reasoning system based on ancestral graph models. (shrink)
Compared to constraint-based causal discovery, causal discovery based on functional causal models is able to identify the whole causal model under appropriate assumptions [Shimizu et al. 2006; Hoyer et al. 2009; Zhang and Hyvärinen 2009b]. Functional causal models represent the effect as a function of the direct causes together with an independent noise term. Examples include the linear non-Gaussian acyclic model, nonlinear additive noise model, and post-nonlinear model. Currently, there are two ways to estimate the parameters in the models: dependence (...) minimization and maximum likelihood. In this article, we show that for any acyclic functional causal model, minimizing the mutual information between the hypothetical cause and the noise term is equivalent to maximizing the data likelihood with a flexible model for the distribution of the noise term. We then focus on estimation of the PNL causal model and propose to estimate it with the warped Gaussian process with the noise modeled by the mixture of Gaussians. As a Bayesian nonparametric approach, it outperforms the previous one based on mutual information minimization with nonlinear functions represented by multilayer perceptrons; we also show that unlike the ordinary regression, estimation results of the PNL causal model are sensitive to the assumption on the noise distribution. Experimental results on both synthetic and real data support our theoretical claims. (shrink)
We examine a formal semantics for counterfactual conditionals due to Judea Pearl, which formalizes the interventionist interpretation of counterfactuals central to the interventionist accounts of causation and explanation. We show that a characteristic principle validated by Pearl’s semantics, known as the principle of reversibility, states a kind of irreversibility: counterfactual dependence (in David Lewis’s sense) between two distinct events is irreversible. Moreover, we show that Pearl’s semantics rules out only mutual counterfactual dependence, not cyclic dependence in general. This, we argue, (...) suggests that Pearl’s logic is either too weak or too strong. (shrink)
We consider Geanakoplos and Polemarchakis’s generalization of Aumman’s famous result on “agreeing to disagree", in the context of imprecise probability. The main purpose is to reveal a connection between the possibility of agreeing to disagree and the interesting and anomalous phenomenon known as dilation. We show that for two agents who share the same set of priors and update by conditioning on every prior, it is impossible to agree to disagree on the lower or upper probability of a hypothesis unless (...) a certain dilation occurs. With some common topological assumptions, the result entails that it is impossible to agree not to have the same set of posterior probabilities unless dilation is present. This result may be used to generate sufficient conditions for guaranteed full agreement in the generalized Aumman-setting for some important models of imprecise priors, and we illustrate the potential with an agreement result involving the density ratio classes. We also provide a formulation of our results in terms of “dilation-averse” agents who ignore information about the value of a dilating partition but otherwise update by full Bayesian conditioning. (shrink)
Experimental results in Ultimatum, Trust and Social Dilemma games have been interpreted as showing that individuals are, by and large, not driven by selfish motives. But we do not need experiments to know that. In our view, what the experiments show is that the typical economic auxiliary hypothesis of non-tuism should not be generalized to other contexts. Indeed, we know that when the experimental situation is framed as a market interaction, participants will be more inclined to keep more money, share (...) less, and disregard other participants’ welfare [Hoffman et al., 1994]. When the same game is framed as a fair division one, participants overall show a much greater concern for the other parties’ interests. The data thus indicate that the context of an interaction is of paramount importance in eliciting different motives. The challenge then is to model utility functions that are general enough to subsume a variety of motives and specific enough to allow for meaningful, interesting predictions to be made. For the sake of simplicity (and brevity), in what follows we will concentrate upon the results of experiments that show what appears to be individuals’ disposition to behave in a fair manner in a variety of circumstances [Camerer, 2003]., though what we are saying can be easily applied to other research areas. Such experimental results have been variously interpreted, each interpretation being accompanied by a specific utility function. We shall consider three such functions and the underlying interpretations that support them, and assess each one on the basis of what they claim to be able to explain and predict. (shrink)
It is commonplace to encounter nonstationary or heterogeneous data, of which the underlying generating process changes over time or across data sets. Such a distribution shift feature presents both challenges and opportunities for causal discovery. In this paper we develop a principled framework for causal discovery from such data, called Constraint-based causal Discovery from Nonstationary/heterogeneous Data, which addresses two important questions. First, we propose an enhanced constraint-based procedure to detect variables whose local mechanisms change and recover the skeleton of the (...) causal structure over observed variables. Second, we present a way to determine causal orientations by making use of independence changes in the data distribution implied by the underlying causal model, benefiting from information carried by changing distributions. Experimental results on various synthetic and real-world data sets are presented to demonstrate the efficacy of our methods. (shrink)
We examine the performance of some standard causal discovery algorithms, both constraint-based and score-based, from the perspective of how robust they are against failures of the Causal Faithfulness Assumption. For this purpose, we make only the so-called Triangle-Faithfulness assumption, which is a fairly weak consequence of the Faithfulness assumption, and otherwise allows unfaithful distributions. In particular, we allow violations of Adjacency-Faithfulness and Orientation-Faithfulness. We show that the PC algorithm, a representative constraint-based method, can be made more robust against unfaithfulness by (...) incorporating elements of the GES algorithm, a representative score-based method; similarly, the GES algorithm can be made less error-prone by incorporating elements of the conservative PC algorithm. As our simulations demonstrate, the increased robustness seems to matter even when faithfulness is not exactly violated, for with only finite sample, distributions that are not exactly unfaithful may be sufficiently close to being unfaithful to make trouble. (shrink)
Causal reasoning is primarily concerned with what would happen to a system under external interventions. In particular, we are often interested in predicting the probability distribution of some random variables that would result if some other variables were forced to take certain values. One prominent approach to tackling this problem is based on causal Bayesian networks, using directed acyclic graphs as causal diagrams to relate post-intervention probabilities to pre-intervention probabilities that are estimable from observational data. However, such causal diagrams are (...) seldom fully testable given observational data. In consequence, many causal discovery algorithms based on data-mining can only output an equivalence class of causal diagrams. This paper is concerned with causal reasoning given an equivalence class of causal diagrams, represented by a ancestral graph. We present two main results. The first result extends Pearl 's celebrated do-calculus to the context of ancestral graphs. In the second result, we focus on a key component of Pearl's calculus---the property of invariance under interventions, and give stronger graphical conditions for this property than those implied by the first result. The second result also improves the earlier, similar results due to Spirtes et al. (shrink)
Forster presented some interesting examples having to do with distinguishing the direction of causal influence between two variables, which he argued are counterexamples to the likelihood theory of evidence. In this paper, we refute Forster's arguments by carefully examining one of the alleged counterexamples. We argue that the example is not convincing as it relies on dubious intuitions that likelihoodists have forcefully criticized. More importantly, we show that contrary to Forster's contention, the consilience-based methodology he favored is accountable within the (...) framework of the LTE. (shrink)
In “Flagpoles anyone? Causal and explanatory asymmetries”, James Woodward supplements his celebrated interventionist account of causation and explanation with a set of new ideas about causal and explanatory asymmetries, which he extracts from some cutting-edge methods for causal discovery from observational data. Among other things, Woodward draws interesting connections between observational causal discovery and interventionist themes that are inspired in the first place by experimental causal discovery, alluding to a sort of unity between observational and experimental causal discovery. In this (...) paper, I make explicit what I take to be the implicated unity. Like experimental causal discovery, observational causal discovery also relies on interventions, albeit interventions that are not carried out by investigators and hence need to be detected as part of the inference. The observational patterns appealed to in observational causal discovery are not only surrogates for would-be interventions, as Woodward sometimes puts it; they also serve to mark relevant interventions that actually happen in the data generating process. (shrink)
A main message from the causal modelling literature in the last several decades is that under some plausible assumptions, there can be statistically consistent procedures for inferring (features of) the causal structure of a set of random variables from observational data. But whether we can control the error probabilities with a finite sample size depends on the kind of consistency the procedures can achieve. It has been shown that in general, under the standard causal Markov and Faithfulness assumptions, the procedures (...) can only be pointwise but not uniformly consistent without substantial background knowledge. This implies the impossibility of choosing a finite sample size to control the worst case error probabilities. In this paper, I consider the simpler task of inferring causal directions when the skeleton of the causal structure is known, and establish a similarly negative result concerning the possibility of controlling error probabilities. Although the result is negative in form, it has an interesting positive implication for causal discovery methods. (shrink)
In this paper, we take another look at the reasons for which the causal criterion of event identity has been abandoned. We argue that the reasons are not strong. First of all, there is a criterion in the neighborhood of the causal criterion—the counterfactual criterion—that is not vulnerable to any of the putative counterexamples brought up in the literature. Secondly, neither the causal criterion nor the counterfactual criterion suffers from any form of vicious circularity. Nonetheless, we do not recommend adopting (...) either the causal or the counterfactual criterion because, given a sufficiently lax principle of event composition, neither criterion can be applied to complex events. This we regard as a (prima facie) undesirable restriction on their applicability. (shrink)
This paper has two main parts. In the first part, we motivate a kind of indeterminate, suppositional credences by discussing the prospect for a subjective interpretation of a causal Bayesian network, an important tool for causal reasoning in artificial intelligence. A CBN consists of a causal graph and a collection of interventional probabilities. The subjective interpretation in question would take the causal graph in a CBN to represent the causal structure that is believed by an agent, and interventional probabilities in (...) a CBN to represent suppositional credences. We review a difficulty noted in the literature with such an interpretation, and suggest that a natural way to address the challenge is to go for a generalization of CBN that allows indeterminate credences. In the second part, we develop a decision-theoretic foundation for such indeterminate suppositional credences, by generalizing a theory of coherent choice functions to accommodate some form of act-state dependence. The upshot is a decision-theoretic framework that is not only rich enough to, so to speak, ground the probabilities in a subjectively interpreted causal network, but also interesting in its own right, in that it accommodates both act-state dependence and imprecise probabilities. (shrink)
One conception of underdetermination is that it corresponds to the impossibility of reliable inquiry. In other words, underdetermination is defined to be the situation where, given a set of background assumptions and a space of hypotheses, it is logically impossible for any hypothesis selection method to meet a given reliability standard. From this perspective, underdetermination in a given subject of inquiry is a matter of interplay between background assumptions and reliability or success criteria. In this paper I discuss underdetermination in (...) causal inference along this line. In particular I will analyze several success criteria that can be applied to causal inference from statistical regularities. The criteria center on the notions of consistency in mathematical statistics. For each criterion I present its epistemic implication in terms of simple conditions under which the criterion cannot possibly be met. I then investigate which of the familiar principles and their variants in the literature, if adopted as background assumptions, are sufficient to overcome different levels of underdetermination induced by the success criteria. “亚决定性”是知识论和科学哲学中一个重要的概念。对这个概念的一种阐释是把它对应于可靠探索的不可能性。就是说,在一个(经验)问题中,给定一些公设和一些供选择的理论或假说,如果逻辑上不可能找到一种理论选择 的方法能满足一定的可靠或成功标准,那么相对于这个标准就存在亚决定性。从这个观点看,亚决定性总是相对于一个问题设定,尤其是公设和成功标准而言的。本文从这个角度对近来的统计因果推理研究作一番梳理。首先,基 于数理统计中的一致性概念,我会讨论和分析一系列可应用于因果推理的成功标准。对每一个标准,我会用一个相对简单的条件来刻画它对应的亚决定性。然后我对文献里一部分重要的结果作一个综述,以澄清什么样的公设可以 消除什么样的亚决定性。. (shrink)
I refute Bailey's claim that his argument for incompatibilism is immune to Campbell's No Past Objection. In my refutation I stress a simple point, that nomological necessitation by future world states does not undermine one's freedom with respect to the present world state. My analysis reveals that the No Past Objection challenges van Inwagen's second consequence argument about as much as it does the others, and suggests that the (uncompromising) incompatibilist must pursue some of the options that Bailey regarded as (...) costly in order to overcome the No Past Objection. (shrink)
Using the flexibility of recently developed methods for causal discovery based on Boolean satisfiability solvers, we encode a variety of assumptions that weaken the Faithfulness assumption. The encoding results in a number of SAT-based algorithms whose asymptotic correctness relies on weaker conditions than are standardly assumed. This implementation of a whole set of assumptions in the same platform enables us to systematically explore the effect of weakening the Faithfulness assumption on causal discovery. An important effect, suggested by simulation results, is (...) that adopting weaker assumptions greatly alleviates the problem of conflicting constraints and substantially shortens solving time. As a result, SAT-based causal discovery is potentially more scalable under weaker assumptions. (shrink)
The relationship between children and their maternal uncles in contemporary Mosuo culture reveals a unique parenting mode in a matrilineal society. This study compared the responses of Mosuo and Han participants from questionnaires on the parent–child and maternal uncle–child relationship. More specifically, Study 1 used Inventory of Parent and Peer Attachment to assess the reactions of the two groups to the relationship between children and their mothers, fathers, and maternal uncles. The results show that while Han people display a higher (...) level of attachment toward their fathers than their maternal uncles, Mosuo people do not exhibit a significant difference in this aspect. Study 2 used a scenario-based method to compare how adults and teenagers perceive the rights and responsibilities of fathers/maternal uncles toward their children/nephews or nieces. The results show that Han adults attribute more rights and responsibilities to their own children than nephews/nieces, while their Mosuo counterparts have the reverse pattern and assign stronger responsibilities to their nephews/nieces than their own children. Both groups perceive the fathers to be the bearer of rights and responsibilities, although this perception was weaker among Mosuo. This paper concludes that in the Mosuo society, fathers have a relatively weak social role as a result of their unique matrilineal social structure. (shrink)
The conditional independence relations present in a data set usually admit multiple causal explanations — typically represented by directed graphs — which are Markov equivalent in that they entail the same conditional independence relations among the observed variables. Markov equivalence between directed acyclic graphs (DAGs) has been characterized in various ways, each of which has been found useful for certain purposes. In particular, Chickering’s transformational characterization is useful in deriving properties shared by Markov equivalent DAGs, and, with certain generalization, is (...) needed to justify a search procedure over Markov equivalence classes, known as the GES algorithm. Markov equivalence between DAGs with latent variables has also been characterized, in the spirit of Verma and Pearl (1990), via maximal ancestral graphs (MAGs). The latter can represent the observable conditional independence relations as well as some causal features of DAG models with latent variables. However, no characterization of Markov equivalent MAGs is yet available that is analogous to the transformational characterization for Markov equivalent DAGs. The main contribution of the current paper is to establish such a characterization for directed MAGs, which we expect will have similar uses as Chickering’s characterization does for DAGs. (shrink)
We study the identifiability and estimation of functional causal models under selection bias, with a focus on the situation where the selection depends solely on the effect variable, which is known as outcome-dependent selection. We address two questions of identifiability: the identifiability of the causal direction between two variables in the presence of selection bias, and, given the causal direction, the identifiability of the model with outcome-dependent selection. Regarding the first, we show that in the framework of post-nonlinear causal models, (...) once outcome-dependent selection is properly modeled, the causal direction between two variables is generically identifiable; regarding the second, we identify some mild conditions under which an additive noise causal model with outcome-dependent selection is to a large extent identifiable. We also propose two methods for estimating an additive noise model from data that are generated with outcome-dependent selection. (shrink)
A primary object of causal reasoning concerns what would happen to a system under certain interventions. Specifically, we are often interested in estimating the probability distribution of some random variables that would result from forcing some other variables to take certain values. The renowned do-calculus gives a set of rules that govern the identification of such post-intervention probabilities in terms of pre-intervention probabilities, assuming available a directed acyclic graph that represents the underlying causal structure. However, a DAG causal structure is (...) seldom fully testable given preintervention, observational data, since many competing DAG structures are equally compatible with the data. In this paper we extend the do-calculus to cover cases where the available causal information is summarized in a so-called partial ancestral graph that represents an equivalence class of DAG structures. The causal assumptions encoded by a PAG are significantly weaker than those encoded by a full-blown DAG causal structure, and are in principle fully testable by observed conditional independence relations. (shrink)
Learning equivalence classes of acyclic models with latent and selection variables from multiple datasets with overlapping variables is discussed. The problem of inferring the presence of latent variables, their relation to the observables, and the relation among themselves, is considered. A different approach for identifying causal structures, one that results in much simpler equivalence classes, is provided. It is found that the computational cost is much higher than the procedure implemented, but if datasets are individually of modest dimensionality, it might (...) be doable in practice. From the point of view of search algorithms for optimizing structure, much of the machinery of combinatorial optimization could optimize the penalized composite likelihood score by enforcing constraints such that the independence models over different subsets of variables agree on the overlapping sets. (shrink)
Different directed acyclic graphs may be Markov equivalent in the sense that they entail the same conditional indepen- dence relations among the observed variables. Meek characterizes Markov equiva- lence classes for DAGs by presenting a set of orientation rules that can correctly identify all arrow orienta- tions shared by all DAGs in a Markov equiv- alence class, given a member of that class. For DAG models with latent variables, maxi- mal ancestral graphs provide a neat representation that facilitates model search. (...) Earlier work has identified a set of orientation rules sufficient to con- struct all arrowheads common to a Markov equivalence class of MAGs. In this paper, we provide extra rules sufficient to construct all common tails as well. We end up with a set of orientation rules sound and complete for identifying commonalities across a Markov equivalence class of MAGs, which is partic- ularly useful for causal inference. (shrink)
It is well known that there may be many causal explanations that are consistent with a given set of data. Recent work has been done to represent the common aspects of these explanations into one representation. In this paper, we address what is less well known: how do the relationships common to every causal explanation among the observed variables of some DAG process change in the presence of latent variables? Ancestral graphs provide a class of graphs that can encode conditional (...) independence relations that arise in DAG models with latent and selection variables. In this paper we present a set of orientation rules that construct the Markov equivalence class representative for ancestral graphs, given a member of the equivalence class. These rules are sound and complete. We also show that when the equivalence class includes a DAG, the equivalence class representative is the essential graph for the said DAG. (shrink)
Different directed acyclic graphs may be Markov equivalent in the sense that they entail the same conditional independence relations among the observed variables. Chickering provided a transformational characterization of Markov equivalence for DAGs, which is useful in deriving properties shared by Markov equivalent DAGs, and, with certain generalization, is needed to prove the asymptotic correctness of a search procedure over Markov equivalence classes, known as the GES algorithm. For DAG models with latent variables, maximal ancestral graphs provide a neat representation (...) that facilitates model search. However, no transformational characterization -- analogous to Chickering's -- of Markov equivalent MAGs is yet available. This paper establishes such a characterization for directed MAGs, which we expect will have similar uses as it does for DAGs. (shrink)