Skip to main content
Log in

Detection of Unfaithfulness and Robust Causal Inference

  • Published:
Minds and Machines Aims and scope Submit manuscript

Abstract

Much of the recent work on the epistemology of causation has centered on two assumptions, known as the Causal Markov Condition and the Causal Faithfulness Condition. Philosophical discussions of the latter condition have exhibited situations in which it is likely to fail. This paper studies the Causal Faithfulness Condition as a conjunction of weaker conditions. We show that some of the weaker conjuncts can be empirically tested, and hence do not have to be assumed a priori. Our results lead to two methodologically significant observations: (1) some common types of counterexamples to the Faithfulness condition constitute objections only to the empirically testable part of the condition; and (2) some common defenses of the Faithfulness condition do not provide justification or evidence for the testable parts of the condition. It is thus worthwhile to study the possibility of reliable causal inference under weaker Faithfulness conditions. As it turns out, the modification needed to make standard procedures work under a weaker version of the Faithfulness condition also has the practical effect of making them more robust when the standard Faithfulness condition actually holds. This, we argue, is related to the possibility of controlling error probabilities with finite sample size (“uniform consistency”) in causal inference.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. As will be explained in section “Causal Graph and Causal Interference” below, the set of variables needs to be causally sufficient.

  2. Individual random variables are in capitalized italics; sets of random variables are in capitalized boldface; individual values for random variables are either numbers or constants represented by lowercase italics; sets of values of random variables are either sets of numbers or sets of constants represented by lowercase boldface.

  3. For the present purpose it suffices to consider simple interventions that set variables to fixed values, upon which more complicated interventions, such as randomized experiments, are built.

  4. As is common in the relevant literature (e.g., Spirtes et al. 1993; Pearl 2000; Woodward 2003), the interventions under consideration are assumed to be local in the sense that they do not directly affect variables other than the targets. For example, we assume that the drug that alleviates chest pain does not also directly affect thrombosis. So aspirin, which affects both chest pain and thrombosis directly would not be one of the drugs considered here. In contrast, ibuprofen, which affects chest pain, but not thrombosis, would be a good candidate here.

  5. It may be more appropriate to say “having a direct causal influence”, to avoid confusion with the chance-raising conception of probabilistic cause.

  6. X is independent of Y conditional on Z in distribution P, written as IP(X, Y|Z), if and only if P(X|Y, Z) = P(X|Z), when P(Y, Z) > 0. By definition, IP(X, ∅|Z) is trivially true. If it is clear which distribution is being referred to we will simply write I(X, Y|Z).

  7. There is a fast method for deciding whether a conditional independence relation is entailed by the Markov Condition, using a graph-theoretical concept called d-separation. For interested readers, Appendix A gives the definition of d-separation.

  8. Of course, statistical tests of conditional independence are fallible on finite sample sizes, but there is a sense in which they become increasingly reliable as the sample sizes grow larger. For most part, we will simply assume that the conditional independence relations can be reliably inferred from the data. We will return to the finite-sample issue in section “More Robust Causal Inference with a Check of Unfaithfulness”.

  9. The Causal Minimality Condition is usually taken as a kind of principle of simplicity. But it has deeper connections to the CMC and the interventionist conception of causation. We will argue in a separate paper that if one accepts the CMC and the interventionist conception of causation, one has very good reason to accept the Minimality condition if there are no deterministic relationship among the variables. Our general result in section “A Further Characterization of Undetectable Failure of Faithfulness” needs to assume the causal Markov and Minimality conditions.

  10. A technical note: in this paper we confine ourselves to causal inference from patterns of conditional independence and dependence. That is, we have in mind those procedures that only exploit statistical information about conditional independence and dependence. Under certain parametric assumptions, there may be statistical information other than conditional independence exploitable for causal inference. For example, Shimizu et al. (2006) showed that in linear causal models with non-Gaussian error terms, the true causal DAG over a causally sufficient set of variables is uniquely determined by the joint probability distribution of these variables, and they developed an algorithm based on what is called Independent Component Analysis (ICA) to infer the causal DAG from data. Their procedure employs statistical information other than conditional independence and dependence. It is also known that in causally insufficient situations, in which we need to consider latent variables, a causal DAG (with latent variables) may entail constraints on the marginal distribution of the observed variables that do not take the form of conditional independence. But it is not yet known how to use such non-independence constraints in causal inference. By contrast, if we assume causal sufficiency, there is no more exploitable information than conditional independence and dependence in linear Gaussian models, or multinomial models for discrete variables.

  11. It would be obvious if we formulate CMC and CFC in terms of d-separation as defined in Appendix A.

  12. In practice the oracle is of course implemented with statistical tests, which are reliable only when the sample size is sufficiently large (and the distributional assumptions are satisfied for parametric tests). We will return to the sample size issue in section “More Robust Causal Inference with a Check of Unfaithfulness”.

  13. By a “smooth” distribution it is meant here a distribution absolutely continuous with Lebesgue measure.

  14. McDermott’s story involves deterministic relationship between variables, but the reason it violates the CFC has nothing to do with determinism. Indeed it is easy to modify the story into a probabilistic version. For example, we can imagine that the terrorist is not so resolute as to admit no positive probability of not pressing the button, and there are some other factors that render a positive probability of explosion even in the absence of the terrorist’s action. As long as whether the dog bites or not does not affect the (non-zero) probability of the terrorist abstaining, and which hand the terrorist uses does not affect the probability of explosion, we have our case. See Cooper (1999) for a fully specified case of this sort with strictly positive probability. We mention this because we will not deal with the distinct problem determinism poses for the CFC in this paper. Our formal results do not explicitly depend on the assumption of no deterministic relationship, but the general result presented in section “A Further Characterization of Undetectable Failure of Faithfulness” below relies on the Causal Minimality Condition, which, as we shall argue in another paper, is a very reasonable assumption when there are no deterministic relations among the variables of interest, but is problematic when there are. For a recent interesting attempt to deal with determinism in statistical causal inference, see Glymour (2007).

  15. Such cases are very peculiar failures of causal transitivity. It is of course old news that counterfactual dependence can fail to be transitive, which motivated David Lewis’s earliest attempt to define causation in terms of ancestral of counterfactual dependence. And no one expects the relation of direct cause to be transitive either. What is peculiar about this case is that it is a failure of transitivity along a single path, and thus a case of intransitivity of what is called contributing cause (Pearl 2000; Hitchcock 2001b). Most counterexamples to causal transitivity in the literature are either cases of intransitivity of what is called total cause or cases of intransitivity of probability-increasing, which involve multiple causal pathways (Hitchcock 2001a).

  16. Strictly speaking, we would also index the variable ‘thermostat’ by time, but we assume the value of the variable remains constant during the time interval.

  17. The specific requirements about the membership of Y in the conditioning set S in the definition of the Triangle-Faithfulness condition are to insure that that the path <X, Y, Z> is active (see the definition in Appendix A) conditional on S, so that the path, as well as the path consisting only the edge between X and Z, contributes to probabilistic association between X and Z conditional on S.

  18. We have designed an asymptotically correct algorithm based on Theorem 2, but we need to improve its computational and statistical efficiency.

  19. By the way, this is another virtue of the Conservative PC procedure. At least in theory, it is appropriately conservative in that it only suspends judgment when the input distribution is truly compatible with multiple alternatives, where PC would make a definite choice.

  20. Zhang and Spirtes (2003) defined stronger versions of the faithfulness condition to exclude close-to-unfaithful parameterizations in linear Gaussian models. One defect of their definitions is that it is uniform across all sample sizes rather than being adaptive to sample size.

  21. Strictly speaking, ϕ denotes a sequence of functions (ϕ1, ϕ2, …, ϕn, …), one for each sample size.

  22. Zhang (2006a) considers a stronger and more reasonable alternative. Our definitions and lemmas here are drawn from Zhang (2006a), except that what we call pointwise consistency and uniform consistency here are referred to as weak pointwise consistency and weak uniform consistency in Zhang (2006a).

  23. As described in Appendix B, the relevant subroutine of the PC algorithm, step [S3], relies on information obtained from the step of inferring adjacencies, i.e., information about the screen-off set found in that step (recorded as Sepset in our description). In the simple case under consideration, since we assume the inferred adjacencies are correct, it follows that the PC algorithm found a screen-off set for X and Z, which is either ∅ or {Y}. If the returned screen-off set is ∅ (i.e., the hypothesis of I(X, Z) is accepted), the triple is inferred to be a collider (i.e., H0 is accepted); if it is {Y}, the triple is inferred to be a non-collider (i.e., H0 is rejected). Also note that to decide whether X and Z are adjacent, the PC algorithm first tests whether I(X, Z|∅) holds, and will not test I(X, Z|Y) unless I(X, Z|∅) is rejected. In other words, the returned screen-off set is {Y} only if the hypothesis of I(X, Z|∅) is rejected. So it is fair to say the PC procedure simply tests whether I(X, Z|∅) holds, and rejects (or accepts) H0 if and only if I(X, Z|∅) is rejected (or accepted).

  24. This is why the so-called Type II error cannot be controlled, and why ‘acceptance’ is regarded as problematic in the Neyman–Pearson framework.

  25. We can make a more general argument to the effect that any procedure that does not return “don’t know” at all cannot be uniformly consistent in inferring edge orientations given the right adjacencies. The argument is based on a fact proved in Zhang (2006a): there is no uniformly consistent test of H 0 versus H 1 that does not return 2 (“don’t know”) if P 0 and P 1 are inseparable in the sense that for every ε > 0, there are P 0  ∈  P 0 and P 1  ∈  P 1 such that the total variation distance between P 0 and P 1 is less than ε, i.e., sup E |P 0 (E) − P 1 (E)| < ε, with E ranging over all events in the algebra. For example, in the simple case of deciding whether an unshielded triple is a collider or a non-collider, it is easy to check that P 0 and P 1 are disjoint given the CMC and CFC assumptions, which implies that a pointwise consistent procedure does not need to use the answer of “don’t know” at all. Indeed, the PC procedure is such a pointwise consistent test that always returns a definite answer. But exactly because of this feature of always being definitive, PC is not uniformly consistent, because P 0 and P 1, though disjoint, are still inseparable in this case. As we have demonstrated in section “A Decomposition of CFC”, there are distributions that violate the Orientation-Faithfulness condition in that both I(X, Z|∅) and I(X, Z|Y) hold. The CFC assumption rules out such distributions as impossible, so they are in neither P 0 nor P 1 . However, it can be shown that in both P 0 and P 1 there are distributions arbitrarily close in total variation distance to such unfaithful distributions, and that is why P 0 and P 1 are inseparable. The fact stated above implies that in situations where P 0 and P 1 are inseparable, a uniformly consistent procedure, if any, has to be cautious at finite sample sizes and be prepared to return “don’t know”. A test that always decides the matter cannot be uniformly consistent, even though it might be pointwise consistent. The PC algorithm, like many other algorithms in the literature including Bayesian and likelihood-based algorithms (e.g., the GES algorithm presented in Chickering 2002), cannot be uniformly consistent because it will always make a definite choice as to whether the unshielded triple is a collider or not. The Conservative PC algorithm, by contrast, is not disqualified by the above fact to be uniformly consistent. See Zhang (2008) for more details.

References

  • Artzenius, F. (1992). The common cause principle. In D. Hull & K. Okruhlik (Eds.), PSA Procceding (Vol. 2, pp. 227–237). East Lansing, MI: PSA.

  • Cartwright, N. (1989). Nature’s capacities and their measurement. Oxford: Clarendon Press.

    Google Scholar 

  • Cartwright, N. (1999). The dappled world. Cambridge: Cambridge University Press.

    MATH  Google Scholar 

  • Cartwright, N. (2001). What is wrong with Bayes nets? The Monist, 242–264.

  • Chickering, D. M. (2002). Optimal structure Identification with greedy search. Journal of Machine Learning Research, 3, 507–554.

    Article  MathSciNet  Google Scholar 

  • Cooper, G. (1999). An overview of the representation and discovery of causal relationships using Bayesian networks. In C. Glymour & G. F. Cooper (Eds.), Computation, causation, and discovery (pp. 3–62). Cambridge, MA: MIT Press.

    Google Scholar 

  • Dawid, P. (2002). Influence diagrams for causal modelling and inference. International Statistical Review, 70, 161–189.

    Article  MATH  Google Scholar 

  • Glymour, C. (1980). Theory and evidence. Princeton: Princeton University Press.

    Google Scholar 

  • Glymour, C. (2007). Learning the structure of deterministic systems. In A. Gopnik, L. Schulz (Eds.), Causal learning: psychology, philosophy and computation (Chap. 14). Oxford University Press.

  • Hausman, D. M., & Woodward, J. (1999). Independence, invariance and the causal Markov condition. British Journal for the Philosophy of Science, 50, 521–583.

    Google Scholar 

  • Hausman, D. M., & Woodward J. (2004). Manipulation and causal Markov condition. Philosophy of Science, 71, 846–856.

    Google Scholar 

  • Heckerman, D., Meek, C., & Cooper, G. (1999). An Bayesian approach to causal discovery. In: C. Glymour & G. F. Cooper (Eds.), Computation, causation, and discovery (Chap. 4). Cambridge, MA: MIT Press.

  • Hesslow, G. (1976). Two notes on the probabilistic approach to causality. Philosophy of Science, 43, 290–292.

    Article  Google Scholar 

  • Hitchcock, C. (2001a). The intransitivity of causation revealed in equations and graphs. Journal of Philosophy, 98, 273–299.

    Article  MathSciNet  Google Scholar 

  • Hitckcock, C. (2001b). A tale of two effects. Philosophical Review, 110, 361–396.

    Google Scholar 

  • Hoover, K. D. (2001). Causality in macroeconomics. Cambridge: Cambridge University Press.

    Google Scholar 

  • Mayo, D. (1996). Error and the growth of experimental knowledge. Chicago: University Of Chicago Press.

    Google Scholar 

  • Mayo, D., & Spanos A. (2004). Methodology in practice: Statistical misspecification testing. Philosophy of Science, 71, 1007–1025.

    Google Scholar 

  • McDermott, M. (1995). Redundant causation. British Journal for the Philosophy of Science, 40, 523–544.

    Article  MathSciNet  Google Scholar 

  • Meek, C. (1995a). Causal inference and causal explanation with background knowledge. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (pp. 403–411). Morgan Kaufmann.

  • Meek, C. (1995b). Strong completeness and faithfulness in Bayesian networks. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (pp. 411–418). San Francisco: Morgan Kaufmann.

  • Pearl, J. (1988). Probabilistic reasoning in intelligence systems. San Mateo, California: Morgan Kaufmann.

    Google Scholar 

  • Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge, UK: Cambridge University Press.

    MATH  Google Scholar 

  • Ramsey, J., Spirtes, P., & Zhang J. (2006). Adjacency-faithfulness and conservative causal inference. In Proceedings of 22nd Conference on Uncertainty in Artificial Intelligence (pp. 401–408). Oregon: AUAI Press.

  • Richardson, T., & Spirtes, P. (2002). Ancestral Markov graphical models. Annals of Statistics, 30(4), 962–1030.

    Google Scholar 

  • Robins, J. M., Scheines, R., Spirtes, P., & Wasserman, L. (2003). Uniform consistency in causal inference. Biometrika, 90(3), 491–515.

  • Shimizu, S., Hoyer, P., Hyvarinen, A., & Kerminen, A. (2006). A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 2003–2030.

  • Sober, E. (1987). The principle of the common cause. In J. Fetzer (Ed.), Probability and causation: Essays in honor of Wesley Salmon (pp. 211–28). Dordrecht: Redel.

    Google Scholar 

  • Spanos, A. (2006). Revisiting the omitted variables argument: Substantive vs. statistical adequacy. Journal of Economic Methodology (forthcoming).

  • Spirtes, P., Glymour, C., & Scheines, R. (1991). An algorithm for fast recovery of sparse causal graphs. Social Science Computer Review, 9, 62–72.

    Google Scholar 

  • Spirtes, P., Glymour, C., & Scheines R. (1993). Causation, prediction and search. New York: Springer-Verlag. (2000, 2nd ed.) Cambridge, MA: MIT Press.

  • Spohn, W. (2000). Bayesian nets are all there is to causal dependence. In M. C. Galavotti et al. (Eds.), Stochastic dependence and causality (pp. 157–172). CSLI Publications.

  • Steel, D. (2006). Homogeneity, selection, and the faithfulness condition. Minds and Machines, 16, 303–317.

    Article  MathSciNet  Google Scholar 

  • Verma, T., & Pearl J. (1990). Equivalence and synthesis of causal models. In Proceedings of 6th Conference on Uncertainty in Artificial Intelligence (pp. 220–227).

  • Woodward, J. (1998). Causal independence and faithfulness. Multivariate Behavioral Research, 33, 129–148.

    Article  Google Scholar 

  • Woodward, J. (2003). Making things happen: A theory of causal explanation. Oxford and New York: Oxford University Press.

    Google Scholar 

  • Zhang, J. (2006a). Underdetermination of causal hypotheses by statistical data. Technical report, Department of Philosophy, Carnegie Mellon University.

  • Zhang, J. (2006b). Causal inference and reasoning in causally insufficient systems. PhD dissertation, Department of Philosophy, Carnegie Mellon University. Available at http://www.hss.caltech.edu/~jiji/dissertation.pdf.

  • Zhang, J. (2008). Error probabilities for inference of causal directions. Synthese (forthcoming)

  • Zhang, J., & Spirtes P. (2003). Strong faithfulness and uniform consistency in causal inference. In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (pp. 632–639). Morgan Kaufmann.

Download references

Acknowledgements

We thank Clark Glymour, Kevin Kelly, Thomas Richardson, Richard Scheines, Oliver Schulte, and James Woodward for very helpful comments. An earlier draft of this paper was presented to the Confirmation, Induction and Science conference held at the London School of Economics and Political Science in March 2007, and we are grateful to the participates for their useful feedback. Special thanks are due to Joseph Ramsey for providing empirical results on the performance of causal discovery algorithms discussed in the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiji Zhang.

Appendices

Appendix A: Basic Graph–Theoretical Notions

In this Appendix, we provide definitions of the graphical theoretical notions we used, in particular, the definition of active or d-connecting path and that of d-separation, which are implicitly used whenever we describe which conditional independencies are entailed or not entailed by the Markov condition.

A directed graph is a pair <V, E>, where V is a set of vertices and E is a set of arrows. An arrow is an ordered pair of vertices, <X, Y>, represented by X → Y. Given a graph G(V, E), if <X, Y> ∈ E, then X and Y are said to be adjacent, and X is called a parent of Y, and Y a child of X. We usually denote the set of X’s parents in G by PA G (X). A path in G is a sequence of distinct vertices <V1, …, Vn>, such that for 1 ≤ i ≤ n − 1, Vi and Vi + 1 are adjacent in G. A directed path in G from X to Y is a sequence of distinct vertices <V1,…,Vn>, such that V1 = X, Vn = Y and for 1 ≤ i ≤ n − 1, Vi is a parent of Vi + 1 in G, i.e., all arrows on the path point in the same direction. X is called an ancestor of Y, and Y a descendant of X if Y or there is a directed path from X to Y. Directed acyclic graphs (DAGs) are directed graphs in which there are no directed cycles, or in other words, there are no two distinct vertices in the graph that are ancestors of each other.

Given two directed graphs G and H over the same set of variables V, G is called a (proper) subgraph of H, and H a (proper) supergraph of G if the set of arrows of G is a (proper) subset of the set of arrows of H.

Given a path p in a DAG, a non-endpoint vertex V on p is called a collider if the two edges incident to V on p are both into V (i.e., → V ←), otherwise V is called a non-collider. Here are the key definitions and proposition:

Active Path: In a directed graph, a path p between vertices A and B is active (or d-connecting) relative to a (possible empty) set of vertices Z (A, B ∉ Z) if

  1. (i)

    every non-collider on p is not a member of Z; and

  2. (ii)

    every collider on p is an ancestor of some member of Z.

D-separation: A and B are said to be d-separated by Z if there is no active path between A and B relative to Z. Two disjoint sets of variables A and B are d-separated by Z if every vertex in A and every vertex in B are d-separated by Z.

Proposition

(Pearl 1988) In a DAG G, X and Y are entailed to be independent conditional on Z if and only if X is d-separated from Y by Z.

Appendix B: PC and Conservative PC

The PC algorithm (Spirtes et al. 1993) is probably the best known representative of what is called constraint-based causal discovery algorithms. It is reproduced here, in which ADJ(G, X) denotes the set of nodes adjacent to X in a graph G:

PC Algorithm

  • [S1] Form the complete undirected graph U on the set of variables V;

  • [S2] n = 0

    • repeat

      • For each pair of variables X and Y that are adjacent in (the current) U such that ADJ(U, X)\{Y} or ADJ(U, Y)\{X} has at least n elements, check through the subsets of ADJ(U, X)\{Y} and the subsets of ADJ(U, Y)\{X} that have exactly n variables. If a subset S is found conditional on which X and Y are independent, remove the edge between X and Y in U, and record S as Sepset(X, Y);

      • n = n + 1;

    • until for each ordered pair of adjacent variables X and Y in U, ADJ(U, X)\{Y} has less than n elements.

  • [S3] Let P be the graph resulting from step [S2]. For each unshielded triple <A, B, C> in P, orient it as A → B ← C if and only if B is not in Sepset(A, C).

  • [S4] Execute the following orientation rules until none of them applies:

    1. (a)

      If A → − C, and A, C are not adjacent, orient as B → C.

    2. (b)

      If A → B → C and − C orient as A → C.

    3. (c)

      If A → B ← C, − − C, − D, and A, C are not adjacent, orient − D as B ← D.

In the PC algorithm, [S2] constitutes the adjacency stage; [S3] and [S4] constitute the orientation stage. In [S2], the PC algorithm essentially searches for a conditioning set for each pair of variables that renders them independent. What distinguishes the PC algorithm from other constraint-based algorithms is the way it performs search. As we can see, two tricks are employed: (1) it starts with the conditioning set of size 0 (i.e., the empty set) and gradually increases the size of the conditioning set; and (2) it confines the search of a screen-off conditioning set for two variables within the potential parents—i.e., the currently adjacent nodes—of the two variables, and thus systematically narrows down the space of possible screen-off sets as the search goes on. These two tricks increase both computational and statistical efficiency in most real cases.

In [S3], the PC algorithm uses a very simple criterion to identify unshielded colliders or non-colliders. [S4] consists of orientation propagation rules based on information about non-colliders obtained in S3 and the assumption of acyclicity. These rules are shown to be both sound and complete in Meek (1995a).

The Conservative PC (CPC) algorithm, replaces [S3] in PC with the following [S3′], and otherwise remains the same.

CPC Algorithm

  • [S1′]: Same as [S1] in PC.

  • [S2′]: Same as [S2] in PC.

  • [S3′] Let P be the graph resulting from step [S2′]. For each unshielded triple <A, B, C> in P, check all subsets of variables adjacent to A, and those adjacent to C.

    1. (a)

      If B is NOT in any such set conditional on which A and C are independent, orient the triple as a collider: A → B ← C;

    2. (b)

      If B is included in all such sets conditional on which A and C are independent, leave the triple as it is, i.e., a non-collider;

    3. (c)

      Otherwise, mark the triple as “ambiguous” (or “don’t know”) by an underline.

  • [S4′] Same as [S4] in PC. (Of course a triple marked ``ambiguous” does not count as a non-collider in [S4](a) and [S4](c).)

Proposition

(Correctness of CPC) Under the CMC and Adjacency-Faithfulness assumptions, the CPC algorithm is asymptotically correct in the sense that given a perfect conditional independence oracle, the algorithm returns a graphical object such that (1) it has the same adjacencies as the true causal graph does; and (2) all arrowheads and unshielded non-colliders in it are also in the true graph.

Proof

Suppose the true causal graph is G, and all conditional independence judgments are correct. The CMC and Adjacency-Faithfulness assumptions imply that the undirected graph P resulting from step [S2′] has the same adjacencies as G does (Spirtes et al. 1993). Now consider [S3′]. If [S3′](a) obtains, then A → B ← C must be a subgraph of G, because otherwise by the CMC, either A’s parents or C’s parents d-separate A and C, which means that there is a subset S of either A’s potential parents or C’s potential parents containing B such that I(A, C|S), contradicting the antecedent in [S3′](a). If [S3′](b) obtains, then A → B ← C cannot be a subgraph of G (and hence the triple must be an unshielded non-collider), because otherwise by the CMC, there is a subset S of either A’s potential parents or C’s potential parents not containing B such that I(A, C|S), contradicting the antecedent in [S3′](b). So neither [S3′](a) nor [S3′](b) will introduce an orientation error. Trivially [S3′](c) does not produce an orientation error, and it has been proven (in e.g., Meek 1995a) that [S4′] will not produce any, which completes the proof.    □

Appendix C: Proof of Theorem 2

Theorem 2

Under the assumptions of CMC and Minimality, if the CFC fails and the failure is undetectable, then the Triangle-Faithfulness condition fails.

Proof

Let P be the population probability distribution of V, and G be the true causal DAG. By assumption, P is not faithful to G, but the unfaithfulness is undetectable, which by definition entails that P is faithful to some DAG H. But P is Markov to G, so G entails strictly fewer conditional independence relations than H does. It is well known that if a DAG entails strictly fewer conditional independence relations than another, then any two variables adjacent in the latter DAG are also adjacent in the former (see, e.g., Chickering 2002). It follows that the adjacencies in G form a proper superset of adjacencies in H. But H is not a proper subgraph of G, for otherwise the Minimality condition fails.

Let G′ be the subgraph of G with the same adjacencies as H. G′ and H are not Markov equivalent because otherwise minimality would be violated for G. So G′ has an unshielded collider X → Y ← Z where H has unshielded non-collider X – Y – Z, or vice-versa (due to the Verma–Pearl theorem on the Markov equivalence of DAGs, Verma and Pearl 1990). Suppose the former. Since the distribution is Markov and faithful to H, all independencies between X and Z are conditional on subsets containing Y, and there is an independence between X and Z conditional on some subset containing Y. If G does not contain an edge between X and Z, then G entails that X and Z are independent conditional on some set not containing Y—but there is no such conditional independence true in P, and hence P would not be Markov to G. So G contains an edge between X and Z, and the Triangle-Faithfulness condition is violated. The case where G′ contains an unshielded non-collider where H has an unshielded collider is similar.    □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, J., Spirtes, P. Detection of Unfaithfulness and Robust Causal Inference. Minds & Machines 18, 239–271 (2008). https://doi.org/10.1007/s11023-008-9096-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11023-008-9096-4

Keywords

Navigation