Skip to content
BY 4.0 license Open Access Published by De Gruyter July 25, 2020

A Combinatorial Solution to Causal Compatibility

  • Thomas C. Fraser EMAIL logo

Abstract

Within the field of causal inference, it is desirable to learn the structure of causal relationships holding between a system of variables from the correlations that these variables exhibit; a sub-problem of which is to certify whether or not a given causal hypothesis is compatible with the observed correlations. A particularly challenging setting for assessing causal compatibility is in the presence of partial information; i.e. when some of the variables are hidden/latent. This paper introduces the possible worlds framework as a method for deciding causal compatibility in this difficult setting. We define a graphical object called a possible worlds diagram, which compactly depicts the set of all possible observations. From this construction, we demonstrate explicitly, using several examples, how to prove causal incompatibility. In fact, we use these constructions to prove causal incompatibility where no other techniques have been able to. Moreover, we prove that the possible worlds framework can be adapted to provide a complete solution to the possibilistic causal compatibility problem. Even more, we also discuss how to exploit graphical symmetries and cross-world consistency constraints in order to implement a hierarchy of necessary compatibility tests that we prove converges to sufficiency.

1 Introduction

A theory of causation specifies the effects of actions with absolute necessity. On the other hand, a probabilistic theory encodes degrees of belief and makes predictions based on limited information. A common fallacy is to interpret correlation as causation; opening an umbrella has never caused it to rain, although the two are strongly correlated. Numerous paradoxical and catastrophic consequences are unavoidable when probabilistic theories and theories of causation are confused. Nonetheless, Reichenbach’s principle asserts that correlations must admit causal explanation; after all, the fear of getting wet causes one to open an umbrella.

In recent decades, a concerted effort has been put into developing a formal theory for probabilistic causation [43, 53]. Integral to this formalism is the concept of a causal structure. A causal structure is a directed acyclic graph, or DAG, which encodes hypotheses about the causal relationships among a set of random variables. A causal model is a causal structure when equipped with an explicit description of the parameters which govern the causal relationships. Given a multivariate probability distribution for a set of variables and a proposed causal structure, the causal compatibility problem aims to determine the existence or non-existence of a causal model for the given causal structure which can explain the correlations exhibited by the variables. More generally, the objective of causal discovery is to enumerate all causal structure(s) compatible with an observed distribution. Perhaps unsurprisingly, causal inference has applications in a variety of academic disciplines including economics, risk analysis, epidemiology, bioinformatics, and machine learning [29, 42, 43, 48, 62].

For physicists, a consideration of causal influence is commonplace; the theory of special/general relativity strictly prohibits causal influences between space-like separated regions of space-time [57]. Famously, in response to Einstein, Podolsky, and Rosen’s [19] critique on the completeness of quantum theory, Bell [7] derived an observational constraint, known as Bell’s inequality, which must be satisfied by all hidden variable models which respect the causal hypothesis of relativity. Moreover, Bell demonstrated the existence of quantum-realizable correlations which violate Bell’s inequality [7]. Recently, it has been appreciated that Bell’s theorem can be understood as an instance of causal inference [61]. Contemporary quantum foundations maintains two closely related causal inference research programs. The first is to develop a theory of quantum causal models in order to facilitate a causal description of quantum theory and to better understand the limitations of quantum resources [3, 6, 13, 17, 25, 30, 36, 38, 44, 47, 60]. The second is the continued study of classical causal inference with the purpose of distinguishing genuinely quantum behaviors from those which admit classical explanations [1, 2, 11, 23, 24, 25, 50, 58, 60]. In particular, the results of [30] suggest that causal structures which support quantum non-classicality are uncommon and typically large in size; therefore, systematically finding new examples of such causal structures will require the development of new algorithmic strategies. As a consequence, quantum foundations research has relied upon, and contributed to, the techniques and tools used within the field of causal inference [13, 30, 50, 60]. The results of this paper are concerned exclusively with the latter research program of classical causal inference, but does not rule out the possibility of a generalization to quantum causal inference.

When all variables in a probabilistic system are observed, checking the compatibility status between a joint distribution and a causal structure is relatively easy; compatibility holds if and only if all conditional independence constraints implied by graphical d-separation relations hold [39, 43]. Unfortunately, in more realistic situations there are ethical, economic, or fundamental barriers preventing access to certain statistically relevant variables, and it becomes necessary to hypothesize the existence of latent/hidden variables in order to adequately explain the correlations expressed by the visible/observed variables [21, 43, 60]. In the presence of latent variables, and in the absence of interventional data, the causal compatibility problem, and by extension the subject of causal inference as a whole, becomes considerably more difficult.

In order to overcome these difficulties, numerous simplifications have be invoked by various authors in order to make partial progress. A particularly popular simplification strategy has been to consider alternative classes of graphical causal models which can act as surrogates for DAG causal models; e.g. MC-graphs [34], summary graphs [59], or maximal ancestral graphs (MAGs) [46, 63]. While these approaches are certainly attractive from a practical perspective (efficient algorithms such as FCI [53] or RFCI [16] exist for assessing causal compatibility with MAGs, for instance), they nevertheless fail to fully capture all constraints implied by DAG causal models with latent variables [22].[1] The forthcoming formalism is concerned with assessing the causal compatibility of DAG causal structures directly, therefore avoiding these shortcomings.

Nevertheless, when considering DAG causal structures directly (henceforth just causal structures), making assumptions about the nature of the latent variables and the parameters which govern them can simplify the problem [28, 54, 56]. For instance, when the latent variables are assumed to have a known and finite cardinality[2], it becomes possible to articulate the causal compatibility problem as a finite system of polynomial equality and inequality equations with a finite list of unknowns for which non-linear quantifier elimination methods, such as Cylindrical Algebraic Decomposition [31], can provide a complete solution. Unfortunately, these techniques are only computationally tractable in the simplest of situations. Other techniques from algebraic geometry have been used in simple scenarios to approach the causal compatibility problem as well [27, 28, 35]. When no assumptions about the nature of the latent variables are made, there are a plethora of methods for deriving novel equality [21, 45] and inequality [2, 4, 8, 11, 20, 23, 26, 30, 55, 58, 60] constraints that must be satisfied by any compatible distribution. The majority of these methods are unsatisfactory on the basis that the derived constraints are necessary, but not sufficient. A notable exception is the Inflation Technique [60], which produces a hierarchy of linear programs (solvable using efficient algorithms [9, 18, 32, 33, 51]) which are necessary and sufficient [37] for determining compatibility.

In contrast with the aforementioned algebraic techniques, the purpose of this paper is to present the possible worlds framework, which offers a combinatorial solution to the causal compatibility problem in the presence of latent variables. Importantly, this framework can only be applied when the cardinalities of the visible variables are known to be finite.[3] This framework is inspired by the twin networks of Pearl [43], parallel worlds of Shpitser [52], and by some original drafts of the Inflation Technique paper [60]. The possible worlds framework accomplishes three things. First, we prove its conceptual advantages by revealing that a number of disparate instances of causal incompatibility become unified under the same premise. Second, we provide a closed-form algorithm for completely solving the possibilistic causal compatibility problem. To demonstrate the utility of this method, we provide a solution to an unsolved problem originally reported [22]. Third, we show that the possible worlds framework provides a hierarchy of tests, much like the Inflation Technique, which solves completely the probabilistic causal compatibility problem.

Unfortunately, the computational complexity of the proposed probabilistic solution is prohibitively large in many practical situations. Therefore, the contributions of this work are primarily conceptual. Nevertheless, it is possible that these complexity issues are intrinsic to the problem being considered. Notably, the hierarchy of tests presented here has an asymptotic rate of convergence commensurate to the only other complete solution to the probabilistic compatibility problem, namely the hierarchy of tests provided in [37]. Moreover, unlike the Inflation Technique, if a distribution is compatible with a causal structure, then the hierarchy of tests provided here has the advantage of returning a causal model which generates that distribution.

This paper is organized as follows: Section 2 begins with a review of the mathematical formalism behind causal modeling, including a formal definition of the causal compatibility problem, and also introduces the notations to be used throughout the paper. Afterwards, Section 3 introduces the possible worlds framework and defines its central object of study: a possible worlds diagram. Section 4 applies the possible worlds framework to prove possibilistic incompatibility between several distributions and corresponding causal structures, culminating in an algorithm for exactly solving the possibilistic causal compatibility problem. Finally, Section 5 establishes a hierarchy of tests which completely solve the probabilistic causal compatibility problem. Moreover, Section 5.1 articulates how to utilize internal symmetries in order to alleviate the aforementioned computational complexity issues. Section 6 concludes.

Appendix A summarizes relevant results from [22] needed in Section 2. Appendix B generalizes the results of [50], placing new upper bounds on the maximum cardinality of the latent variables, required for Sections 2 and 5.

2 A Review of Causal Modeling

This review section is segmented into three portions. First, Section 2.1 defines directed graphs and their properties. Second, Section 2.2 introduces the notation and terminology regarding probability distributions to be used throughout the remainder of this article. Finally, Section 2.3 defines the notion of a causal model and formally introduces the causal compatibility problem.

2.1 Directed Graphs

Definition 1

A directed graph 𝓖 is an ordered pair 𝓖 = (𝓠, 𝓔) where 𝓠 is a finite set of vertices and 𝓔 is a set edges, i.e. ordered pairs of vertices 𝓔 ⊆ 𝓠 × 𝓠. If (q, u) ∈ 𝓔 is an edge, denoted as qu, then u is a child of q and q is a parent of u. A directed path of length k is a sequence of vertices q(1)q(2) → ⋯ → q(k) connected by directed edges. For a given vertex q, pa𝓖(q) denotes its parents and ch𝓖(q) its children. If there is a directed path from q to u then q is an ancestor of u and u is a descendant of q; the set of all ancestors of q is denoted an𝓖(q) and the set of all descendants is denoted des𝓖(q). The definition for parents, children, ancestors and descendants of a single vertex q are applied disjunctively to sets of vertices Q ⊆ 𝓠:

chG(Q)=qQchG(q),paG(Q)=qQpaG(q),(1)
anG(Q)=qQanG(q),desG(Q)=qQdesG(q).(2)

A directed graph is acyclic if there is no directed path of length k > 1 from q back to q for any q ∈ 𝓠 and cyclic otherwise. For example, Figure 1 depicts the difference between cyclic and acyclic directed graphs.

Figure 1 The difference between a directed cyclic graph and a directed acyclic graph.
Figure 1

The difference between a directed cyclic graph and a directed acyclic graph.

Definition 2

The subgraph of 𝓖 = (𝓠, 𝓔) induced by 𝓦 ⊂ 𝓠, denoted sub𝓖(𝓦), is given by,

subG(W)=W,EW×W,(3)

i.e. the graph obtained by taking all edges from 𝓔 which connect members of 𝓦.

2.2 Probability Theory

Definition 3

(Probability Theory). A probability space is a triple (Ω, Ξ, P) where the state space Ω is the set of all possible outcomes, Ξ ⊆ 2Ω is the set of events forming a σ-algebra over Ω, and P is a σ-additive function from events to probabilities such that P(Ω) = 1.

Definition 4

(Probability Notation). For a collection of random variables X𝓘 = {X1, X2, …, Xk} indexed by i ∈ 𝓘 = {1, 2, …, k} where each Xi takes values from Ωi, a joint distribution P𝓘 = P12…k assigns probabilities to outcomes from Ω𝓘 = ∏i∈𝓘Ωi. The event that each Xi takes value xi, referred to as a valuation of X𝓘[4], is denoted as,

PI(xI)=P12kx1x2xk=PX1=x1,X2=x2,Xk=xk.(4)

A point distribution P𝓘(y𝓘) = 1 for a particular event y𝓘Ω𝓘 is expressed using square brackets,

PI(yI)=1PI(xI)=[yI](xI)=δ(yI,xI)=iIδ(yi,xi).(5)

The set of all probability distributions over Ω𝓘 is denoted as ℙ𝓘. Let ki denote the cardinality or size of Ωi. If Xi is discrete, then ki = |Ωi|, otherwise Xi is continuous and ki = ∞.

2.3 Causal Models and Causal Compatibility

A causal model represents a complete description of the causal mechanisms underlying a probabilistic process. Formally, a causal model is a pair of objects (𝓖, 𝓟), which will be defined in turn. First, 𝓖 is a directed acyclic graph (𝓠, 𝓔), whose vertices q ∈ 𝓠 represent random variables X𝓠 = {Xq | q ∈ 𝓠}. The purpose of a causal structure is to graphically encode the causal relationships between the variables. Explicitly, if qu ∈ 𝓔 is an edge of the causal structure, Xq is said to have causal influence on Xu[5]. Consequently, the causal structure predicts that given complete knowledge of a valuation of the parental variables Xpa𝓖(u) = {Xq | q ∈ pa𝓖(u)}, the random variable Xu should become independent of its non-descendants[6] [43]. With this observation as motivation, the causal parameters 𝓟 of a causal model are a family of conditional probability distributions Pq|pa𝓖(q) for each q ∈ 𝓠. In the case that q has no parents in 𝓖, the distribution is simply unconditioned. The purpose of the causal parameters are to predict a joint distribution P𝓠 over the configurations Ω𝓠 of a causal structure,

xQΩQ,PQ(xQ)=qQPq|paG(q)(xq|xpaG(q)).(6)

If the hypotheses encoded within a causal structure 𝓖 are correct, then the observed distribution over Ω𝓠 should factorize according to Equation (6). Unfortunately, as discussed in Section 1, there are often ethical, economic, or fundamental obstacles preventing access to all variables of a system. In such cases, it is customary to partition the vertices of causal structure into two disjoint sets; the visible (observed) vertices 𝓥, and the latent (unobserved) vertices 𝓛 (for example, see Figure 2). Additionally, we denote visible parents of any vertex q ∈ 𝓥 ∪ 𝓛 as vpa𝓖(q) = 𝓥 ∩ pa𝓖(q) and analogously for the latent parents lpa𝓖(q) = 𝓛 ∩ pa𝓖(q).

Figure 2 The causal structure 𝓖2 in this figure encodes a causal hypothesis about the causal relationships between the visible variables 𝓥 = {v1, v2, v3, v4, v5} and the latent variables 𝓛 = {ℓ1, ℓ2, ℓ3}; e.g. v2 experiences a direct causal influence from each of its parents, both visible vpa𝓖2(v2) = {v1, v4} and latent lpa𝓖2(v2) = {ℓ1, ℓ2}. Throughout this paper, visible variables and edges connecting them are colored blue whereas all latent variables and all other edges are colored red.
Figure 2

The causal structure 𝓖2 in this figure encodes a causal hypothesis about the causal relationships between the visible variables 𝓥 = {v1, v2, v3, v4, v5} and the latent variables 𝓛 = {1, 2, 3}; e.g. v2 experiences a direct causal influence from each of its parents, both visible vpa𝓖2(v2) = {v1, v4} and latent lpa𝓖2(v2) = {1, 2}. Throughout this paper, visible variables and edges connecting them are colored blue whereas all latent variables and all other edges are colored red.

In the presence of latent variables, Equation 6 stills makes a prediction about the joint distribution P𝓥∪𝓛(x𝓥, λ𝓛)[7] over the visible and latent variables, albeit an experimenter attempting to verify or discredit a causal hypothesis only has access to the marginal distribution P𝓥(x𝓥). If Ω𝓛 is continuous,

xVΩV,PV(xV)=λLΩLdPVL(xV,λL)(7)

If Ω𝓛 is discrete,

xVΩV,PV(xV)=λLΩLPVL(xV,λL).(8)

A natural question arises; in the absence of information about the latent variables 𝓛, how can one determine whether or not their causal hypotheses are correct? The principle purpose of this paper is to provide the reader with methods for answering this question.

In general, other than being a directed acyclic graph, there are no restrictions placed on a causal structure with latent variables. Nonetheless, [22] demonstrates that every causal structure 𝓖 can be converted into a standard form that is observationally equivalent to 𝓖 where the latent variables are exogenous (have no parents) and whose children sets are isomorphic to the facets of a simplicial complex over 𝓥[8]. Appendix A summarizes the relevant results from [22] necessary for making this claim. Additionally, Appendix B demonstrates that any finite distribution P𝓥 which satisfies the causal hypotheses (i.e. Equation 7) can be generated using deterministic causal parameters for the visible variables and moreover, the cardinalities of the latent variables can be assumed finite[9]. Altogether, Appendices A and B suggest that without loss of generality, we can simplify the causal compatibility problem as follows:

Definition 5

(Functional Causal Model). A (finite) functional causal model for a causal structure 𝓖 = (𝓥 ∪ 𝓛, 𝓔) is a triple (𝓖, 𝓕𝓥, 𝓟𝓛) where

FV={fv:ΩpaG(v)ΩvvV}(9)

are deterministic functions for the visible variables 𝓥 in 𝓖, and

PL=P:Ω0,1L(10)

are finite probability distributions for the latent variables 𝓛 in 𝓖. A functional causal model defines a probability distribution P𝓥 : Ω𝓥 → [0, 1],

xVΩV,PV(xV)=LλΩP(λ)vLδ(xv,fv(xvpaG(v),λlpaG(v))).(11)

Definition 6

(The Causal Compatibility Problem). Given a causal structure 𝓖 = (𝓥 ∪ 𝓛, 𝓔) and a distribution P𝓥 over the visible variables 𝓥, the causal compatibility problem is to determine if there exists a functional causal model (𝓖, 𝓕𝓥, 𝓟𝓛) (defined in Definition 5) such that Equation 11 reproduces P𝓥. If such a functional causal model exists, then P𝓥 is said to be compatible with 𝓖; otherwise P𝓥 is incompatible with 𝓖. The set of all compatible distributions on 𝓥 for a causal structure 𝓖 is denoted 𝓜𝓥(𝓖).

3 The Possible Worlds Framework

Consider the causal structure in Figure 3a denoted 𝓖3a. For the sake of concreteness, suppose one is promised the latent variables are sampled from a binary sample space, i.e. kμ = kν = 2. Let zμ = Pμ(0μ) and zν = Pν(0ν). The causal hypothesis 𝓖3a predicts (via Equation 11) that observable events (xa, xb, xc) ∈ Ωa × Ωb × Ωc will be distributed according to,

Pabc=zμzν[obsabc(0μ0ν)]+zμ(1zν)[obsabc(0μ1ν)]++(1zμ)zν[obsabc(1μ0ν)]+(1zμ)(1zν)[obsabc(1μ1ν)],(12)
Figure 3 A causal structure 𝓖3a and the creation of the possible worlds diagram when kμ = kν = 2.
Figure 3

A causal structure 𝓖3a and the creation of the possible worlds diagram when kμ = kν = 2.

where obsabc(λμλν) ∈ Ωa × Ωb × Ωc is shorthand for the observed event generated by the autonomous functions fa, fb, fc for each (λμ, λν) ∈ Ωμ × Ων. In the case of 𝓖3a,

obsabc(λμλν)=(fa(λμ),fb(fa(λμ),λν),fc(fb(fa(λμ),λν),λν)).(13)

For each distinct realization (λμ, λν) ∈ Ωμ × Ων of the latent variables, one can consider a possible world wherein the values λμ, λν are not sampled according to the respective distributions Pμ, Pν, but instead take on definite values. From the perspective of counterfactual reasoning, each world is modelling a distinct counterfactual assignment of the latent variables, but not the visible variables.[10] In this particular example, there are kμ × kν = 2 × 2 = 4 distinct, possible worlds. Figure 3b represents, and uniquely colors, these possible worlds. Note that the definite valuations of the latent variables in Figure 3b are depicted using squares[11]. Critically, regardless of the deterministic functional relationships fa, fb, fc, there are identifiable consistency constraints that must hold between these worlds. For example, a is determined by a function fa : ΩμΩa and thus the observed value for a in the yellow (0μ0ν)-world must be exactly the same as the observed value for a in the green (0μ1ν)-world. This cross-world consistency constraint is illustrated in Figure 3c by embedding each possible world into a larger diagram with overlapping λμa subgraphs. It is important to remark that not all cross-world consistency constraints are captured by this diagram; the value of b in the yellow (0μ0ν)-world must match the value of b in the orange (1μ0ν)-world if the value of a in both possible worlds is the same.

Figure 4 A vertex of a possible worlds diagram dissected.
Figure 4

A vertex of a possible worlds diagram dissected.

For comparison, in the original causal structure 𝓖3a, the vertices represented random variables sampled from distributions associated with causal parameters; whereas in the possible worlds diagram of Figure 3c, every valuation, including the latent valuations are predetermined by the functional dependences fa, fb, fc. For example, Figure 3d populates Figure 3c with the observable events generated by the following functional dependences,

fa(0μ)=0afa(1μ)=1a,fb(0a0ν)=3bfb(0a1ν)=1bfb(1a0ν)=2bfb(1a1ν)=0b,fc(3b0μ0ν)=0cfc(1b0μ1ν)=1cfc(2b1μ0ν)=2cfc(0b1μ1ν)=3c.(14)

The utility of Figure 3d is in its simultaneous accounts of Equation 14, the causal structure 𝓖3a and the cross-world consistency constraints that 𝓖3a induces. Nonetheless, Figure 3d fails to specify the probabilities zμ, zν associated with the latent events. In Section 4, we utilize diagrams analogous to Figure 3d to tackle the causal compatibility problem. Before doing so, this paper needs to formally define the possible worlds framework.

Definition 7

(The Possible Worlds Framework). Let 𝓖 = (𝓥 ∪ 𝓛, 𝓔), be a causal structure with visible variables 𝓥 and latent variables 𝓛. Let 𝓕𝓥 be a set of functional parameters for 𝓥 defined exactly as in Equation 9. The possible worlds diagram for the pair (𝓖, 𝓕𝓥) is a directed acyclic graph 𝓓 satisfying the following properties:

  1. (Valuation Vertices) Each vertex in 𝓓 consists of three pieces (consult Figure 4 for clarity):

    1. a subscript q ∈ 𝓥 ∪ 𝓛 corresponding to a vertex in 𝓖 (indicated inside a small circle in the bottom-right corner),

    2. an integer ω corresponding to a possible valuation/outcome ωq of q where ωq ∈ {0q, 1q, …} = Ωq (indicated inside the square of each vertex),

    3. and a decoration in the form of colored outlines[12] indicating which worlds (defined below) the vertex is a member of[13].

  2. (Ancestral Isomorphism)[14] For every valuation vertex ωq in 𝓓, the ancestral subgraph of ωq in 𝓓 is isomorphic to the ancestral subgraph of q in 𝓖 under the map ωqq.

    subD(anD(ωq))subG(anG(q))(15)
  3. (Consistency) Each valuation vertex xv of a visible variable v ∈ 𝓥 is consistent with the output of the functional parameter fv ∈ 𝓕𝓥 when applied to the valuation vertices pa𝓓(xv),

    xv=fv(paD(xv))(16)
  4. (Uniqueness) For each latent variable ∈ 𝓛, and for every valuation λΩ there exists a unique valuation vertex in 𝓓 corresponding to λ. Unlike latent valuation vertices, the valuations of visible variables xvΩv may be repeated (or absent) from 𝓓 depending on the form of 𝓕𝓥. In such cases, duplicated xv’s are always uniquely distinguished by world membership (colored outline).

  5. (Worlds) A world is a subgraph of 𝓓 that is isomorphic to 𝓖 under the map ωqq. Let wor(λ𝓛) ⊆ 𝓓 denote the world containing the valuation λ𝓛Ω𝓛[15]. Furthermore, for any subset V ⊆ 𝓥 of visible variables, let obsV(λ𝓛) ∈ ΩV denote the observed event supported by wor(λ𝓛).

  6. (Completeness) For every valuation of the latent variables λ𝓛Ω𝓛, there exists a subgraph corresponding to wor(λ𝓛).[16]

It is important to remark that although a possible worlds diagram 𝓓 can be constructed from the pair (𝓖, 𝓕𝓥), the two mathematical objects are not equivalent; the functional parameters 𝓕𝓥 can contain superfluous information that never appears in 𝓓. We return to this subtle but crucial observation in Section 5.1.

The essential purpose of the possible worlds construction is as a diagrammatic tool for calculating the observational predictions of a functional causal model. Lemma 1 captures this essence.

Lemma 1

Given a functional causal model (𝓖 = (𝓥 ∪ 𝓛, 𝓔), 𝓕𝓥, 𝓟𝓛) (see Definition 5), let 𝓓 be the possible worlds diagram for (𝓖, 𝓕𝓥). The causal compatibility criterion (Equation 11) for 𝓖 is equivalent to a probabilistic sum over worlds in 𝓓:

PV=λLΩLLP(λ)[obsV(λL)].(17)

The remainder of this paper explores the consequences of adopting the possible worlds framework as a method for tackling the causal compatibility problem.

4 A Complete Possibilistic Solution

Section 3 introduced the possible worlds framework as a technique for calculating the observable predictions of a functional causal model by means of Lemma 1. In this section, we use the possible worlds framework to develop a combinatorial algorithm for completely solving the possibilistic causal compatibility problem.

Definition 8

Given a probability distribution P𝓥 : Ω𝓥 → [0, 1], its supportσ(P𝓥) is defined as the subset of events which are possible,

σ(PV)=xVΩVPV(xV)>0.(18)

An observed distribution P𝓥 is said to be possibilistically compatible with 𝓖 if there exists a functional causal model (𝓖, 𝓕𝓥, 𝓟𝓛) for which Equation 11 produces a distribution with the same support as P𝓥. The possibilistic variant of the causal compatibility problem is naturally related to the probabilistic causal compatibility problem defined in Definition 6; if a distribution is possibilistically incompatible with 𝓖, then it is also probabilistically incompatible. We now proceed to apply the possible worlds framework to prove possibilistic incompatibility between a number of distribution/causal structure pairs.

4.1 A Simple Example Causal Structure

Consider the causal structure 𝓖5 depicted in Figure 5. For 𝓖5, the causal compatibility criteria (Equation 11) takes the form,

Pabc(xaxbxc)=λμΩμλνΩνPμ(λμ)Pν(λν)δ(xa,fa(λμ))δ(xb,fb(λμ,λν))δ(xc,fc(λν)).(19)
Figure 5 A causal structure 𝓖5 with three visible vertices 𝓥 = {a, b, c} and two latent vertices 𝓛 = {μ, ν}.
Figure 5

A causal structure 𝓖5 with three visible vertices 𝓥 = {a, b, c} and two latent vertices 𝓛 = {μ, ν}.

The following family of distributions for arbitrary xb, ybΩb,

Pabc(20)=z[0axb1c]+(1z)[1ayb0c]),0<z<1,(20)

are incompatible with 𝓖5. Traditionally, distributions like Pabc(20) are proven incompatible on the basis that they violate an independence constraint that is implied by 𝓖5 [43], namely,

PabcM(G5),Pac(xaxc)=Pa(xa)Pc(xc).(21)

Intuitively, 𝓖5 provides no latent mechanism by which a and c can attempt to correlate (or anti-correlate). We now prove the possibilistic incompatibility of the support σ(Pabc(20)) with 𝓖5 using the possible worlds framework.

Proof

Proof by contradiction; assume that a functional causal model 𝓕𝓥 = {fa, fb, fc} for 𝓖5 exists such that Equation 19 produces Pabc(20). Since there are two distinct valuations of the joint variables abc in Pabc(20), namely 0axb1c and 1ayb0c, consider each as being sampled from two possible worlds. Without loss of generality[17], let 0μ0νΩμ × Ων denote any valuation of the latent variables such that obsabc(0μ0ν) = 0axb1c. Similarly, let 1μ1νΩμ × Ων denote any valuation of the latent variables such that obsabc(1μ1ν) = 1ayb0c. Using these observations, initialize a possible worlds diagram using wor(0μ0ν), colored green, and wor(1μ1ν), colored violet, as seen in Figure 6a. In order to complete Figure 6a, one simply needs to specify the behavior of b in two of the “off-diagonal” worlds, namely wor(0μ1ν), colored orange, and wor(1μ0ν), colored yellow (see Figure 6b). Regardless of this choice, the observed event obsac(0μ1ν) = 0a0c in the orange world wor(0μ1ν) predicts Pac(0a0c) > 0[18] which contradicts Pabc(20). Therefore, because the proof technique did not rely on the value of 0 < z < 1, Pabc(20) is possibilistically incompatible with 𝓖5.□

Figure 6 The possible worlds diagram for 𝓖5 (Figure 5) is incompatible with Pabc(20)$\begin{array}{}
\displaystyle
\mathtt{P}_{abc}^{(20)}
\end{array}$ (Equation 20).
Figure 6

The possible worlds diagram for 𝓖5 (Figure 5) is incompatible with Pabc(20) (Equation 20).

4.2 The Instrumental Structure

The causal structure 𝓖7 depicted in Figure 7 is known as the Instrumental Scenario [8, 40, 41]. For 𝓖7, Equation 11 takes the form,

Pabcxaxbxc=λμΩμλνΩνPμ(λμ)Pν(λν)δ(xa,fa(λμ))δ(xb,fb(a,λν))δ(xc,fc(b,λν)).(22)
Figure 7 The Instrumental Scenario.
Figure 7

The Instrumental Scenario.

The following family of distributions,

Pabc(23)=z0a0b0c+(1z)1a0b1c,0<z<1,(23)

are possibilistically incompatible with 𝓖7. The Instrumental scenario 𝓖7 is different from 𝓖5 in that there are no observable conditional independence constraints which can prove the possibilistic incompatibility of Pabc(23). Instead, the possibilistic incompatibility of Pabc(23) is traditionally witnessed by an Instrumental inequality originally derived in [41],

PabcM(G7),Pbc|a(0b0c|0a)+Pbc|a(0b1c|1a)1.(24)

Independently of Equation 24, we now prove possibilistic incompatibility of Pabc(23) with 𝓖7 using the possible worlds framework.

Proof

Proof by contradiction; assume that a functional model 𝓕𝓥 = {fa, fb, fc} for 𝓖7 exists such that Equation 22 produces Pabc(23) (Equation 23). Analogously to the proof in Section 4.1, there are only two distinct valuations of the joint variables abc, namely 0a0b0c and 1a0b1c. Therefore, define two worlds one where obsabc(0μ0ν) = 0a0b0c and another where obsabc(1μ1ν) = 1a0b1c. Using these two worlds, a possible worlds diagram can be initialized as in Figure 8a where wor(0μ0ν) is colored yellow and wor(1μ1ν) is colored orange. In order to complete the possible worlds diagram of Figure 8a, one first needs to specify how b behaves in two possible worlds: wor(0μ1ν) colored green and wor(1μ0ν) colored violet.

obsb(1μ0ν)_=fb(1a0ν)=?b,obsb(0μ1ν)_=fb(0a1ν)=?b.(25)
Figure 8 A possible worlds diagram for 𝓖7 (Figure 7). The worlds are colored: wor(0μ0ν) yellow, wor(1μ1ν) orange, wor(1μ0ν) violet, wor(0μ1ν) green.
Figure 8

A possible worlds diagram for 𝓖7 (Figure 7). The worlds are colored: wor(0μ0ν) yellow, wor(1μ1ν) orange, wor(1μ0ν) violet, wor(0μ1ν) green.

By appealing to Pabc(23), it must be that obsb(1μ0ν) = obsb(0μ1ν) = 0b as no other valuations for b are in the support of Pabc(23). Finally, the remaining ‘unknown’ observations for c in the violet world obsc(1μ0ν) = fc(0b0ν), and green world obsc(0μ1ν) = fc(0b1ν) are determined respectively by the behavior of c in the orange wor(1μ1ν) and yellow wor(0μ0ν) worlds as depicted in Figure 8b. Explicitly,

obsc(1μ0ν)_=fc(0b0ν)=obsc(0μ0ν)_=0c,obsc(0μ1ν)_=fc(0b1ν)=obsc(1μ1ν)_=1c.(26)

Therefore the observed events in the green and violet worlds are fixed to be,

obsabc(1μ0ν)_=1a0b0c,obsabc(0μ1ν)_=0a0b1c.(27)

Unfortunately, neither of theses events are in the support of Pabc(23), which is a contradiction; therefore Pabc(23) is possibilistically incompatible with 𝓖7.□

Notice that unlike the proof from Section 4.1, here we needed to appeal to the cross-world consistency constraints (Equation 26) demanded by the possible worlds framework.

4.3 The Bell Structure

Consider the causal structure 𝓖9 depicted in Figure 9 known as the Bell structure [7]. From the perspective of causal inference, Bell’s theorem [7] states that any distribution compatible with 𝓖9 must satisfy an inequality constraint known as a Bell inequality. For example, the inequality due to Clauser, Horne, Shimony and Holt, referred to as the CHSH inequality, constrains correlations held between a and b as x, y vary [15][19],

PxabyM(G9),S=ab|0x0y+ab|0x1y+ab|1x0yab|1x1y,|S|2(28)
Figure 9 The Bell causal structure has variables a, b ‘measuring’ hidden variable ρ with ‘measurement settings’ x, y determined independently of ρ.
Figure 9

The Bell causal structure has variables a, b ‘measuring’ hidden variable ρ with ‘measurement settings’ x, y determined independently of ρ.

Correlations measured by quantum theory are capable of violating this inequality up to S = 22 [14]. This violation is not maximum; it is possible to achieve a violation of S = 4 using Popescu-Rohrlich box correlations [49]. The following distribution is an example of a Popescu-Rohrlich box correlation,

Pxaby(29)=18([0x0a0b0y]+[0x1a1b0y]+[0x0a0b1y]+[0x1a1b1y]++[1x0a0b0y]+[1x1a1b0y]+[1x0a1b1y]+[1x1a0b1y]).(29)

Unlike 𝓖7, there are conditional independence constraints placed on correlations compatible with 𝓖9, namely the no-signaling constraints Pa|xy = Pa|x and Pb|xy = Pb|y. Because Pxaby(29) satisfies the no-signaling constraints, the incompatibility of Pxaby(29) with 𝓖9 is traditionally proven using Equation 28. We now proceed to prove its incompatibility using the possible worlds framework.

Proof

Proof by contradiction; assume that a functional causal model 𝓕𝓥 = {fa, fb, fx, fy} for 𝓖9 exists which supports Pxaby(29) and use the possible worlds framework. Unlike the previous proofs, we only need to consider a subset of the events in Pxaby(29) to initialize a possible worlds diagram. Consider the following pair of events and associated latent valuations which support them[20],

obsxaby(0μ0ρ0ν)_=0a0b0x0y,obsxaby(1μ1ρ1ν)_=1a0b1x1y.(30)

Using Equation 30, initialize the possible worlds diagram in Figure 10 with worlds wor(0μ0ρ0ν) colored green and wor(1μ1ρ1ν) colored violet. An unavoidable contradiction arises when attempting to populate the values for fa(0x1ρ) in the yellow world wor(0μ1ρ1ν) and fb(0y1ρ) in the magenta world wor(1μ1ρ0ν). First, the observed event obsxaby(0μ1ρ1ν) = 0x?a1b1y in the yellow world wor(0μ1ρ1ν) must belong to the list of possible events prescribed by Pxaby(29); a quick inspection leads one to recognize that the only possibility is obsa(0μ1ρ1ν) = fa(0x1ρ) = 1a. An analogous argument in the enta world wor(1μ1ρ0ν) proves that obsb(1μ1ρ0ν) = fb(0y1ρ) = 0b. Therefore, the observed event in the orange world wor(0μ1ρ0ν) must be,

obsabcd(0μ1ρ0ν)_=0x1a0b0y,(31)
Figure 10 An incomplete possible worlds diagram for the Bell structure 𝓖9 (Figure 9) initialized by the observed events obsxaby(0μ0ρ0ν) = 0x0a0b0y and obsxaby(1μ1ρ1ν) = 1x0a1b1y. The worlds are colored: wor(0μ0ρ0ν) green, wor(1μ1ρ1ν) violet, wor(1μ1ρ0ν) magenta, wor(0μ1ρ1ν) yellow, and wor(0μ1ρ0ν) orange.
Figure 10

An incomplete possible worlds diagram for the Bell structure 𝓖9 (Figure 9) initialized by the observed events obsxaby(0μ0ρ0ν) = 0x0a0b0y and obsxaby(1μ1ρ1ν) = 1x0a1b1y. The worlds are colored: wor(0μ0ρ0ν) green, wor(1μ1ρ1ν) violet, wor(1μ1ρ0ν) magenta, wor(0μ1ρ1ν) yellow, and wor(0μ1ρ0ν) orange.

and therefore Pxaby(0x1a0b0y) > 0 which contradicts Pxaby(29). Therefore, Pxaby(29) is possibilistically[21] incompatible with 𝓖9.

4.4 The Triangle Structure

Consider the causal structure 𝓖11 depicted in Figure 11 known as the Triangle structure. The Triangle has been studied extensively in recent decades [10, 12, 23, 24, 30, 37, 55, 58, 60]. The following family of distributions are possibilistically incompatible with 𝓖11[22],

Pabc(32)=p1[1a0b0c]+p2[0a1b0c]+p3[0a0b1c],i=13pi=1,pi>0.(32)
Figure 11 The Triangle structure 𝓖11 involving three visible variables 𝓥 = {a, b, c} each sharing a pair of latent variables from 𝓛 = {μ, ν, ρ}.
Figure 11

The Triangle structure 𝓖11 involving three visible variables 𝓥 = {a, b, c} each sharing a pair of latent variables from 𝓛 = {μ, ν, ρ}.

Proof

Proof by contradiction: assume that a functional causal model 𝓕𝓥 = {fa, fb, fc} for 𝓖11 exists supporting Pabc(32) and use the possible worlds framework. For each distinct event in Pabc(32), consider a world in which it happens definitely. Explicitly define,

obsabc(0μ0ρ0ν)_=1a0b0c,(33)
obsabc(1μ1ρ1ν)_=0a0b1c,(34)
obsabc(2μ2ρ2ν)_=0a1b0c,(35)

corresponding to the exterior worlds in Figure 12. Consider magenta world wor(0μ1ρ1ν) with partially specified observation obsabc(0μ1ρ1ν) = ?a?b1c. Recalling Pabc(32), whenever c takes value 1c, botha and b take the value 0; i.e. 0a0b. Therefore, it must be that the observed event in the magenta world wor(0μ1ρ1ν) is obsabc(0μ1ρ1ν) = 0a0b1c. An analogous argument holds for other worlds,

obsabc(0μ1ρ1ν)_=?a?b1cobsabc(0μ1ρ1ν)_=0a0b1c,obsabc(2μ2ρ1ν)_=?a1b?cobsabc(2μ2ρ1ν)_=0a1b0c,obsabc(0μ2ρ0ν)_=1a?b?cobsabc(0μ2ρ0ν)_=1a0b0c.(36)
Figure 12 An incomplete possible worlds diagram for the Triangle structure 𝓖11 (Figure 11) initialized by the triplet of observed events in Equation 35. The worlds are colored: wor(0μ0ν0ρ) brown, wor(1μ1ν1ρ) yellow, wor(2μ2ν2ρ) orange, wor(0μ1ν1ρ) magenta, wor(2μ2ν1ρ) blue, wor(0μ2ν0ρ) violet, and wor(0μ2ν1ρ) green.
Figure 12

An incomplete possible worlds diagram for the Triangle structure 𝓖11 (Figure 11) initialized by the triplet of observed events in Equation 35. The worlds are colored: wor(0μ0ν0ρ) brown, wor(1μ1ν1ρ) yellow, wor(2μ2ν2ρ) orange, wor(0μ1ν1ρ) magenta, wor(2μ2ν1ρ) blue, wor(0μ2ν0ρ) violet, and wor(0μ2ν1ρ) green.

However, the conclusions drawn by Equation 36 predict the observed event in the central, green world wor(0μ2ρ1ν) must be,

obsabc(0μ2ρ1ν)_=0a0b0c,(37)

and therefore Pabc(0a0b0c) > 0 which contradicts Pabc(32). Therefore, Pabc(32) is possibilistically incompatible with 𝓖11.□

4.5 An Evans Causal Structure

Consider the causal structure in Figure 13, denoted 𝓖13. This causal structure, along with two others, was first mentioned by Evans [22] as one for which no existing techniques were able to prove whether or not it was saturated; that is, whether or not all distributions were compatible with it. Here it is shown that there are indeed distributions which are possibilistically incompatible with 𝓖13 using the possible worlds framework. As such, this framework currently stands as the most powerful method for deciding possibilistic compatibility.

Figure 13 The Evans Causal Structure 𝓖13.
Figure 13

The Evans Causal Structure 𝓖13.

Consider the family of distributions with three possible events:

Pabcd(38)=p1[0a0b0cyd]+p2[1a0b1c0d]+p3[0a1b1c1d],i=13pi=1,pi>0.(38)

Regardless of the values for p1, p2, p3 (and ydΩd arbitrary), Pabcd(38) is incompatible with 𝓖13.

Proof

Proof by contradiction. First assume that a deterministic model 𝓕𝓥 = {fa, fb, fc, fd} for Pabcd(38) exists and adopt the possible worlds framework. Let wor(iμiνiρ) for i ∈ {1, 2, 3} index the possible worlds which support the events observed in Pabcd,

obsabcd(0μ0ν0ρ)_=0a0b0cyd,obsabcd(1μ1ν1ρ)_=1a0b1c0d,obsabcd(2μ2ν2ρ)_=0a1b1c1d.(39)

Only two additional possible worlds are necessary for achieving a contradiction. Consulting Figure 14 for details, these possible worlds are wor(1μ0ν2ρ) colored violet and wor(1μ2ν2ρ) colored green. Notice that the determined value for a must be the same in both worlds as it is independent of λν:

xa=fa(1μ2ρ)=obsa(1μ0ν2ρ)_=obsa(1μ2ν2ρ)_.(40)
Figure 14 A possible worlds diagram for 𝓖13 initialized by the distribution in Equation 38. The worlds are colored: wor(0μ0ν0ρ) magenta, wor(1μ1ν1ρ) orange, wor(2μ2ν2ρ) yellow, wor(1μ0ν2ρ) violet, and wor(1μ02ν2ρ) green.
Figure 14

A possible worlds diagram for 𝓖13 initialized by the distribution in Equation 38. The worlds are colored: wor(0μ0ν0ρ) magenta, wor(1μ1ν1ρ) orange, wor(2μ2ν2ρ) yellow, wor(1μ0ν2ρ) violet, and wor(1μ02ν2ρ) green.

There are only two possible values for xa in any world, namely xa = 0a or xa = 1a as given by Pabcd(38). First suppose that xa = 0a. Then in the violet world wor(1μ0ν2ρ), the value of b, to be obsb(1μ0ν2ρ) = fb(0a0ν) = 0b is completely constrained by consistency with the magenta world wor(0μ0ν0ρ). Therefore, obsab(1μ0ν2ρ) = 0a0b. By analogous logic, in the violet world the value of c is constrained to be obsc(1μ0ν2ρ) = fc(0b1μ) = 0c by the orange world wor(1μ1ν1ρ). Therefore, obsabc(1μ0ν2ρ) = 0a0b0c, which is a contradiction because 0a0b0c is an impossible event in Pabcd(38). Therefore, it must be that xa = 1a. An unavoidable contradiction follows from attempting to populate the green world wor(1μ2ν2ρ) in Figure 14 with the established knowledge that obsa(1μ2ν2ρ) = 1a. The value of obsb(1μ2ν2ρ) = fb(1a1ν) has yet to be specified by any possible worlds, but choosing fb(1a1ν) = 1b would yield an impossible event obsa(1μ2ν2ρ) = 1a1b. Therefore, it must be that fb(1a1ν) = 0b and obsa(1μ2ν2ρ) = 1a0b. Similarly, the orange world wor(1μ1ν1ρ) fixes fc(0b1μ) = 1c and therefore obsabc(1μ2ν2ρ) = 1a0b1c. Finally, the yellow world wor(2μ2ν2ρ) already determines obsd(1μ2ν2ρ) = fd(0c2ν2ρ) = 1d and therefore one concludes that,

obsabcd(1μ2ν2ρ)_=1a0b1c1d,(41)

which is an impossible event in Pabcd(38). This contradiction implies that no functional model 𝓕𝓥 = {fa, fb, fc, fd} exists and therefore Pabcd(38) is possibilistically incompatible with 𝓖13.□

To reiterate, there are currently no other methods known [22] which are capable of proving the incompatibility of any distribution with 𝓖13[23]. Therefore, the possible worlds framework can be seen as the state-of-the-art technique for determining possibilistic causation.

4.6 Necessity and Sufficiency

Throughout this section, we explored a number of proofs of possibilistic incompatibility using the possible worlds framework. Moreover, the above examples communicate a systematic algorithm for deciding possibilistic compatibility. Given a distribution P𝓥 with support σ(P𝓥) ⊂ Ω𝓥, and a causal structure 𝓖 = (𝓥 ∪ 𝓛, 𝓔), the following algorithm sketch determines if P𝓥 is possibilistically compatible with 𝓖.

  1. Let W = |σ(P𝓥)| < |Ω𝓥| denote the number of possible events provided by P𝓥.

  2. For each 1 ≤ iW, create a possible world wor(λL(i))whereλL(i)=iL, thus defining the latent sample space Ω𝓛.

  3. Attempt to complete the possible worlds diagram 𝓓 initialized by the worlds wor(λL(i))i=1W.

  4. If an impossible event x𝓥σ(P𝓥) is produced by any “off-diagonal” world wor(… ij …) where ij, or if a cross-world consistency constraint is broken, back-track.

Upon completing the search, there are two possibilities. The first possibility is that the algorithm returns a completed, consistent, possible worlds diagram 𝓓. Then by Lemma 1, P𝓥 is possibilistically compatible with 𝓖. The second possibility is that an unavoidable contradiction arises, and P𝓥 is not possibilistically compatible with 𝓖.[24]

5 A Complete Probabilistic Solution

In Section 4, we demonstrated that the possible worlds framework was capable of providing a complete possibilistic solution to the causal compatibility problem. If however, a given distribution P𝓥 happens to satisfy a causal hypothesis on a possibilistic level, can the possible worlds framework be used to determine if P𝓥 satisfies the causal hypothesis on a probabilistic level as well? In this section, we answer this question affirmatively. In particular, we provide a hierarchy of feasibility tests for probabilistic compatibility which converges exactly. In addition, we illustrate that a possible worlds diagram is the natural data structure for algorithmically implementing this converging hierarchy.

5.1 Symmetry and Superfluity

This aforementioned hierarchy of tests, to be explained in Section 5.3, relies on the enumeration of all probability distributions P𝓥 which admit uniform functional causal models (𝓖, 𝓕𝓥, 𝓟𝓛) for fixed cardinalities k𝓥∪𝓛 = {kq = |Ωq| | q ∈ 𝓥 ∪ 𝓛}. A functional causal model is uniform if the probability distributions P ∈ 𝓟𝓛 over the latent variables are uniform distributions; P : Ωk1. Section 5.2 discusses why uniform functional causal models are worth considering, whereas in this section, we discuss how to efficiently enumerate all probability distributions P𝓥 that are uniformly generated from fixed cardinalities k𝓥∪𝓛.

One method for generating all such distributions is to perform a brute force enumeration of all deterministic strategies 𝓕𝓥 for fixed cardinalities k𝓥∪𝓛. Depending on the details of the causal structure, the number of deterministic functions of this form is poly-exponential in the cardinalities k𝓥∪𝓛. This method is inefficient because is fails to consider that many distinct deterministic strategies produce the exact same distribution P𝓥. There are two optimizations that can be made to avoid regenerations of the same distribution P𝓥 while enumerating all deterministic strategies 𝓕𝓥. These optimizations are best motivated by an example using the possible worlds framework.

Consider the causal structure 𝓖15a in Figure 15a with visible variables 𝓥 = {a, b, c} and latent variables 𝓛 = {μ, ν}. Furthermore, for concreteness, suppose that kμ = kν = ka = ka = 2 and kc = 4. Finally let 𝓕𝓥 = {fa, fb, fc} be such that,

fa(0μ)=0a,fa(1μ)=1a,fb(0μ)=0b,fb(1μ)=1b,fc(0a0b0ν)=2c,fc(0a0b1ν)=0c,fc(1a1b0ν)=3c,fc(1a1b1ν)=1cfc(0a1b0ν)=0c,fc(0a1b1ν)=1c,fc(1a0b0ν)=2c,fc(1a0b1ν)=3c.(42)
Figure 15 Every permutation πℓ : Ωℓ → Ωℓ of valuations on the latent variables maps a possible worlds diagram to another possible worlds diagram with the same observed events. The worlds are colored: wor(0μ0ν) green, wor(0μ1ν) orange, wor(1μ0ν) yellow, and wor(1μ1ν) violet.
Figure 15

Every permutation π : ΩΩ of valuations on the latent variables maps a possible worlds diagram to another possible worlds diagram with the same observed events. The worlds are colored: wor(0μ0ν) green, wor(0μ1ν) orange, wor(1μ0ν) yellow, and wor(1μ1ν) violet.

The possible worlds diagram 𝓓 for 𝓖15a generated by Equation 42 is depicted in Figure 15b. If the latent valuations are distributed uniformly, the probability distribution associated with Figure 15b (as given by Equation 17) is equal to,

Pabc=14([wor(0μ0ν)_]+[wor(0μ1ν)_]+[wor(1μ0ν)_]+[wor(1μ1ν)_])=14([0a0b2c]+[0a0b0c]+[1a1b3c]+[1a1b1c]).(43)

The first optimization comes from noticing that Equation 42 specifies how c would respond if provided with the valuation 1a0b1ν of its parents, namely fc(1a0b1ν) = 3c. Nonetheless, this hypothetical scenario is excluded from Figure 15b (crossed out in the figure) because the functional model in Equation 42 never produces an opportunity for a to be different from b. Consequently, the functional dependences in Equation 42 contain superfluous information irrelevant to the observed probability distribution in Equation 43.

Therefore, a brute force enumeration of deterministic strategies would regenerate Equation 43 several times, once for each assignment of c’s behavior in these superfluous scenarios. It is possible to avoid these regenerations by using an unpopulated possible worlds diagram 𝓓̃ as a data structure and performing a brute force enumeration of all consistent valuations of 𝓓̃.

The second optimization comes from noticing that Equation 43 contains many symmetries. Notably, independently permuting the latent valuations, πμ : 0μ ↔ 1μ or πν : 0ν ↔ 1ν, leaves the observed distribution in Equation 43 invariant, but maps the functional dependences 𝓕𝓥 of Equation 42 to different functional dependences FVπμandFVπν. These symmetries are reflected as permutations of the worlds as depicted in Figures 15c, and 15d.

Analogously, it is possible to avoid these regenerations by first pre-computing the induced action on 𝓓̃, and thus an induced action on 𝓕𝓥, under the permutation group S𝓛 = ∏∈𝓛 perm(Ω). Then, using the permutation group S𝓛, one only needs to generate a representative from the equivalence classes of possible worlds diagrams 𝓓 under S𝓛.

Importantly, the optimizations illuminated above, namely ignoring superfluous specifications and exploiting symmetries, are universal[25]; they can be applied for any causal structure. Additionally, the possible worlds framework intuitively excludes superfluous cases and directly embodies the observational symmetries, making a possible worlds diagram the ideal data structure for performing a search over observed distributions.

5.2 The Uniformity of Latent Distributions

The purpose of this section is motivate why it is always possible to approximate any functional causal model (𝓖, 𝓕𝓥, 𝓟𝓛) with another functional causal model (𝓖, 𝓕̃𝓥, 𝓟̃𝓛) which has latent events λ𝓛Ω̃𝓛 uniformly distributed. Unsurprisingly, an accurate approximation of this form will require an increase in the cardinality |Ω̃𝓛| > |Ω𝓛| of the latent variables.

Definition 9

(Rational Distributions). A discrete probability distribution P over Ω is rational if every probability assigned to events in Ω by P is rational,

λΩ,P(λ)=nλdλ,wherenλ,dλZ.(44)

Definition 10

(Distance Metric for Distributions). Given two probability distributions P, P̃ over the same sample space Ω, the distance Δ(P, P̃) between P and P̃ is defined as,

Δ(P,P~)=xΩ|P(x)P~(x)|(45)

Theorem 2

Let P : Ω → [0, 1] be any discrete probability distribution onΩ, then there exists a rational approximation : Ω → [0, 1],

λΩ,P~(λ)=1|Ωu|ωuΩuδ(λ,g(ωu)),(46)

where g : ΩuΩis deterministic andΔ(P,P~)|Ωu|1|Ω|.

Proof

The proof is illustrated in Figure 16. In the special case that |Ω| = 1, the proof is trivial; g simply maps all values of ωu to the singleton λΩ. The proof follows from a construction of g using inverse uniform sampling. Given some ordering 1 < 2 < ⋯ of Ω and ordering 1u < 2u < ⋯ of Ωu compute the cumulative distribution function P(λ)=λλP(λ). Then the function g : ΩuΩ is defined as,

g(ωu)=minλΩP(λ)|Ωu|ωu.(47)
Figure 16 Theorem 2: Approximately sampling a non-uniform distribution using inverse sampling techniques.
Figure 16

Theorem 2: Approximately sampling a non-uniform distribution using inverse sampling techniques.

Consequently, the proportion of ωuΩu values which map to λΩ has error ε(λ),

ε(λ)=|Ωu|P(λ)|g1(λ)|,(48)

where |ε(λ)| ≤ 1 for all λΩ with the exception of the minimum (1μ) and maximum (|Ω|) values where |ε(λ)| ≤ 1/2. Therefore, the proof follows from a direct computation of the distance Δ(P, P̃),

Δ(P,P~)=λΩ|P(λ)P~(λ)|,(49)
=λΩP(λ)1|Ωu|g1(λ),(50)
=1|Ωu|λΩ|ε(λ)|,(51)
1|Ωu||Ω|2+212,(52)
=|Ω|1|Ωu|.(53)

In terms of the causal compatibility problem, Theorem 2 suggests that if an observed distribution P𝓥 is compatible with 𝓖, and there exists a functional causal model (𝓖, 𝓕𝓥, 𝓟𝓛) which reproduces P𝓥 (via Equation 11), then it must be close to a rational distribution P̃𝓥 generated by a functional causal model (𝓖, 𝓕̃𝓥, 𝓟̃𝓛) wherein probability distributions for the latent variables 𝓟̃𝓛 are uniform. The following theorem proves this.

Theorem 3

Let (𝓖, 𝓕𝓥, 𝓟𝓛) be a functional causal model with cardinalitiesc = |Ω| for the latent variables producing distribution P𝓥. Then there exists a functional causal model (𝓖, 𝓕̃𝓥, 𝓟̃𝓛) with cardinalitiesk = |Ω̃| for the latent variables producing𝓥where the distributionsP~L={U:Ω~k1L}over the latent variables are uniform. In particular, the distance between P𝓥and𝓥is bounded by,

Δ(PV,P~V)ε=n=1L1n!L(C1)KnOLCK,(54)

whereC = max{c | ∈ 𝓛}, K = min{k | ∈ 𝓛}, andL = |𝓛| is the number of latent variables.

Proof

The proof relies on Theorem 2 and can be found in Appendix C.□

5.3 A Converging Hierarchy of Compatibility Tests

In Section 5.1, we discussed how to take advantage of the symmetries of a possible worlds diagram and the superfluities within a set of functional parameters 𝓕𝓥 in order to optimally search over functional models. In Section 5.2, we discussed how to approximate any functional causal model (𝓖, 𝓕𝓥, 𝓟𝓛) using one with uniform latent probability distributions. Here we combine these insights into a hierarchy of probabilistic compatibility tests for the causal compatibility problem.

Definition 11

Given a causal structure 𝓖, and given cardinalities[26]k𝓛 = {k = |Ω| | ∈ 𝓛} for the latent variables, define the uniformly induced distributions, denoted as UV(kL)(𝓖), as the set of all distributions P̃𝓥 ∈ 𝓜𝓥(𝓖) which admit of a uniform functional model (𝓖, 𝓕𝓥, 𝓟𝓛) with cardinalities k𝓛.

Recall that Section 5.1 demonstrates a method, using the possible worlds framework, for efficient generation of the entirety of UV(kL)(𝓖).

Lemma 4

The uniformly induced distributionsUV(kL)(𝓖) form anε-dense set in 𝓜𝓥(𝓖),

PVMVGP~VUV(kL)(G),Δ(PV,P~V)εOLCK(55)

whereεis a function ofK = min{k | ∈ 𝓛}, the number of latent variablesL = |𝓛|, andC = max{c | ∈ 𝓛} wherecis the minimum upper bound placed on the cardinalities of the latent variableby Theorem 9.

Proof

Since c𝓛 = {c | ∈ 𝓛} are minimum upper bounds placed on the cardinalities of the latent variables by Theorem 9, any P𝓥 ∈ 𝓜𝓥(𝓖) must admit a functional causal model with cardinalities for the latent variables at most c𝓛. Then by Theorem 3, there exists a uniform causal model producing P̃𝓥UV(kL)(𝓖), within a distance ε given by Equation 54.□

Lemma 4 forms the basis of the following compatibility test,

Theorem 5

(The Causal Compatibility Test of Order K). For a probability distribution P𝓥and a causal structure 𝓖, the causal compatibility test of orderK = min{k | ∈ 𝓛} is defined as the following question:

DoesthereexistauniformlyinduceddistributionP~VUV(kL)(G)suchthatΔ(PV,P~V)εK?27.

AsK → ∞, the distance tends to zeroε(K) → 0 and the sensitivity of the test increases. If P𝓥 ∉ 𝓜𝓥(𝓖), then P𝓥will fail the test for finiteK. If P𝓥 ∈ 𝓜𝓥(𝓖), then P𝓥will pass the test for allK. Moreover, for fixedK, the test can readily return the functional causal model behind the best approximation𝓥.

First notice that Theorem 5 achieves the same rate of convergence as [37]. Unlike the result of [37], Theorem 5 returns a functional model which approximates P𝓥. It is interesting to remark that the distance bound ε ∈ 𝓞(LC/K) in Equation 55 depends on C = max{c | ∈ 𝓛} where c is the minimum upper bound placed on the cardinalities of the latent variable by Theorem 9. As conjectured in Appendix B, it is likely that there are tighter bounds that can be placed on these cardinalities for certain causal structures. Therefore, further research into lowering these bounds will improve the performance of Theorem 5.

6 Conclusion

In conclusion, this paper examined the abstract problem of causal compatibility for causal structures with latent variables. Section 3 introduced the framework of possible worlds in an effort to provide solutions to the causal compatibility problem. Central to this framework is the notion of a possible worlds diagram, which can be viewed as a hybrid between a causal structure and the functional parameters of a causal model. It does not however, convey any information about the probability distributions over the latent variables.

In Section 4, we utilized the possible worlds framework to prove possibilistic incompatibility of a number of examples. In addition, we demonstrated the utility of our approach by resolving an open problem associated with one of Evans’ [22] causal structures. Particularly, we have shown the causal structure in Figure 13 is incompatible with the distribution in Equation 38. Section 4 concluded with an algorithm for completely solving the possibilistic causal compatibility problem.

In Section 5, we discussed how to efficiently search through the observational equivalence classes of functional parameters using a possible worlds diagram as a data structure. Afterwards, we derived bounds on the distance between compatible distributions and uniformly induced ones. By combining these results, we provide a hierarchy of necessary tests for probabilistic causal compatibility which converge in the limit.[27]

Acknowledgement

Foremost, I must thank my supervisor Robert W. Spekkens for his unwavering support and encouragement. Second, I would like to sincerely thank Elie Wolfe for our numerous and lengthy discussions. Without him or his research, this paper simply would not exist. Finally, I thank the two anonymous referees for providing insight necessary for significantly improving this paper.

References

[1] Samson Abramsky and Adam Brandenburger. The Sheaf-Theoretic Structure Of Non-Locality and Contextuality. New J. Phys, 13(11):113036, nov 2011.10.1088/1367-2630/13/11/113036Search in Google Scholar

[2] Samson Abramsky and Lucien Hardy. Logical Bell inequalities. Phys. Rev. A, 85:062114, Jun 2012.10.1103/PhysRevA.85.062114Search in Google Scholar

[3] John-Mark A. Allen, Jonathan Barrett, Dominic C. Horsman, Ciaran M. Lee, and Robert W. Spekkens. Quantum common causes and quantum causal models. arXiv:1609.09487, 2016.Search in Google Scholar

[4] Jean-Daniel Bancal, Nicolas Gisin, and Stefano Pironio. Looking for symmetric Bell inequalities. J. Phys. A, 43(38):385303, aug 2010.10.1088/1751-8113/43/38/385303Search in Google Scholar

[5] Imre Bárány and Roman Karasev. Notes about the carathéodory number. Discrete & Computational Geometry, 48(3):783–792, 2012.10.1007/s00454-012-9439-zSearch in Google Scholar

[6] Jonathan Barrett. Information processing in generalized probabilistic theories. Physical Review A, 75(3):032304, 2007.10.1103/PhysRevA.75.032304Search in Google Scholar

[7] John S Bell. On the Einstein-Podolsky-Rosen paradox. Physics, 1(3):195–200, 1964.10.1103/PhysicsPhysiqueFizika.1.195Search in Google Scholar

[8] B. Bonet. Instrumentality Tests Revisited. ArXiv e-prints, January 2013.Search in Google Scholar

[9] Bradley, Hax, and Magnanti. Applied Mathematical Programming. Addison-Wesley, 1977.Search in Google Scholar

[10] Cyril Branciard, Denis Rosset, Nicolas Gisin, and Stefano Pironio. Bilocal versus nonbilocal correlations in entanglement-swapping experiments. Phys. Rev. A, 85:032119, Mar 2012.10.1103/PhysRevA.85.032119Search in Google Scholar

[11] Rafael Chaves. Polynomial bell inequalities. Physical review letters, 116(1):010402, 2016.10.1103/PhysRevLett.116.010402Search in Google Scholar PubMed

[12] Rafael Chaves, Lukas Luft, and David Gross. Causal structures from entropic information: geometry and novel scenarios. New J. Phys., 16(4):043001, 2014.10.1088/1367-2630/16/4/043001Search in Google Scholar

[13] Rafael Chaves, Christian Majenz, and David Gross. Information–theoretic implications of quantum causal structures. Nature communications, 6:5766, 2015.10.1038/ncomms6766Search in Google Scholar PubMed

[14] B. S. Cirel’son. Quantum generalizations of Bell’s inequality. Lett. Math Phys., 4(2):93–100, mar 1980.10.1007/BF00417500Search in Google Scholar

[15] John F. Clauser, Michael A. Horne, Abner Shimony, and Richard A. Holt. Proposed experiment to test local hidden-variable theories. Phys. Rev. Lett., 23:880–884, Oct 1969.10.1103/PhysRevLett.23.880Search in Google Scholar

[16] Diego Colombo, Marloes H Maathuis, Markus Kalisch, and Thomas S Richardson. Learning high-dimensional directed acyclic graphs with latent and selection variables. The Annals of Statistics, pages 294–321, 2012.10.1214/11-AOS940Search in Google Scholar

[17] Fabio Costa and Sally Shrapnel. Quantum causal modelling. New Journal of Physics, 18(6):063032, 2016.10.1088/1367-2630/18/6/063032Search in Google Scholar

[18] George B Dantzig and B Curtis Eaves. Fourier-Motzkin elimination and its dual. J. Combin. Theor. A, 14(3):288–297, may 1973.10.1016/0097-3165(73)90004-6Search in Google Scholar

[19] A. Einstein, B. Podolsky, and N. Rosen. Can Quantum-Mechanical Description of Physical Reality Be Considered Complete? Phys. Rev., 47:777–780, May 1935.10.1103/PhysRev.47.777Search in Google Scholar

[20] R. J. Evans. Graphical methods for inequality constraints in marginalized DAGs. ArXiv e-prints, September 2012.10.1109/MLSP.2012.6349796Search in Google Scholar

[21] Robin J. Evans. Margins of discrete Bayesian networks. arXiv:1501.02103, 2015.10.1111/sjos.12194Search in Google Scholar

[22] Robin J Evans. Graphs for margins of bayesian networks. Scandinavian Journal of Statistics, 43(3):625–648, 2016.10.1111/sjos.12194Search in Google Scholar

[23] T. Fraser and E. Wolfe. Causal Compatibility Inequalities Admitting of Quantum Violations in the Triangle Structure. ArXiv e-prints, September 2017.10.1103/PhysRevA.98.022113Search in Google Scholar

[24] Tobias Fritz. Beyond Bell’s Theorem: Correlation Scenarios. New J. Phys, 14(10):103001, oct 2012.10.1088/1367-2630/14/10/103001Search in Google Scholar

[25] Tobias Fritz. Beyond Bell’s Theorem II: Scenarios with arbitrary causal structure. Comm. Math. Phys., 341(2):391–434, nov 2014.10.1007/s00220-015-2495-5Search in Google Scholar

[26] Tobias Fritz and Rafael Chaves. Entropic Inequalities and Marginal Problems. IEEE Trans. Info. Theor., 59(2):803–817, feb 2011.10.1109/TIT.2012.2222863Search in Google Scholar

[27] Luis David Garcia, Michael Stillman, and Bernd Sturmfels. Algebraic geometry of bayesian networks, 2003.Search in Google Scholar

[28] D. Geiger and C. Meek. Graphical Models and Exponential Families. ArXiv e-prints, January 2013.Search in Google Scholar

[29] O. Goudet, D. Kalainathan, P. Caillou, I. Guyon, D. Lopez-Paz, and M. Sebag. Causal Generative Neural Networks. ArXiv e-prints, November 2017.Search in Google Scholar

[30] J. Henson, R. Lal, and M. F. Pusey. Theory-independent limits on correlations from generalized Bayesian networks. New Journal of Physics, 16(11):113043, November 2014.10.1088/1367-2630/16/11/113043Search in Google Scholar

[31] Mats Jirstrand. Cylindrical algebraic decomposition-an introduction. Linköping University, 1995.Search in Google Scholar

[32] Colin Jones, Eric C Kerrigan, and Jan Maciejowski. Equality set projection: A new algorithm for the projection of polytopes in halfspace representation. Technical report, Cambridge University Engineering Dept, 2004.Search in Google Scholar

[33] Dimitris J. Kavvadias and Elias C. Stavropoulos. An Efficient Algorithm for the Transversal Hypergraph Generation. J. Graph Algor. Applic., 9(2):239–264, 2005.10.7155/jgaa.00107Search in Google Scholar

[34] Jan TA Koster et al. Marginalizing and conditioning in graphical models. Bernoulli, 8(6):817–840, 2002.Search in Google Scholar

[35] C. M. Lee and R. W. Spekkens. Causal inference via algebraic geometry: feasibility tests for functional causal structures with two binary observed variables. ArXiv e-prints, June 2015.10.1515/jci-2016-0013Search in Google Scholar

[36] M. S. Leifer and R. W. Spekkens. Towards a Formulation of Quantum Theory as a Causally Neutral Theory of Bayesian Inference. ArXiv e-prints, July 2011.10.1103/PhysRevA.88.052130Search in Google Scholar

[37] M. Navascues and E. Wolfe. The inflation technique solves completely the classical inference problem. ArXiv e-prints, July 2017.Search in Google Scholar

[38] Ognyan Oreshkov, Fabio Costa, and Caslav Brukner. Quantum correlations with no causal order. Nat. Comm., 3:1092, oct 2011.10.1038/ncomms2076Search in Google Scholar PubMed PubMed Central

[39] J. Pearl. A Constraint Propagation Approach to Probabilistic Reasoning. ArXiv e-prints, March 2013.Search in Google Scholar

[40] J. Pearl. On the Testability of Causal Models with Latent and Instrumental Variables. ArXiv e-prints, February 2013.Search in Google Scholar

[41] Judea Pearl. On the Testability of Causal Models with Latent and Instrumental Variables. pages 435–443, Aug 1995.Search in Google Scholar

[42] Judea Pearl. Causal inference in statistics: An overview. Stat. Surv., 3(0):96–146, 2009.Search in Google Scholar

[43] Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2009.10.1017/CBO9780511803161Search in Google Scholar

[44] Jacques Pienaar and Časlav Brukner. A graph-separation theorem for quantum causal models. New Journal of Physics, 17(7):073020, 2015.10.1088/1367-2630/17/7/073020Search in Google Scholar

[45] T. S. Richardson, R. J. Evans, J. M. Robins, and I. Shpitser. Nested Markov Properties for Acyclic Directed Mixed Graphs. ArXiv e-prints, January 2017.Search in Google Scholar

[46] Thomas Richardson, Peter Spirtes, et al. Ancestral graph markov models. The Annals of Statistics, 30(4):962–1030, 2002.10.1214/aos/1031689015Search in Google Scholar

[47] K. Ried, M. Agnew, L. Vermeyden, D. Janzing, R. W. Spekkens, and K. J. Resch. A quantum advantage for inferring causal structure. Nature Physics, 11:414–420, May 2015.10.1038/nphys3266Search in Google Scholar

[48] James M Robins, Miguel Angel Hernan, and Babette Brumback. Marginal structural models and causal inference in epidemiology, 2000.10.1097/00001648-200009000-00011Search in Google Scholar PubMed

[49] D. Rohrlich and S. Popescu. Nonlocality as an axiom for quantum theory. quant-ph/9508009, 1995.Search in Google Scholar

[50] D. Rosset, N. Gisin, and E. Wolfe. Universal bound on the cardinality of local hidden variables in networks. ArXiv e-prints, September 2017.10.26421/QIC18.11-12-2Search in Google Scholar

[51] Alexander Schrijver. Theory of Linear and Integer Programming. Wiley, 1998.Search in Google Scholar

[52] Ilya Shpitser and Judea Pearl. Complete identification methods for the causal hierarchy. Journal of Machine Learning Research, 9(Sep):1941–1979, 2008.Search in Google Scholar

[53] Peter Spirtes, Clark N Glymour, and Richard Scheines. Causation, prediction, and search. MIT press, 2000.10.7551/mitpress/1754.001.0001Search in Google Scholar

[54] Peter L. Spirtes. Directed cyclic graphical representations of feedback models, 2013.Search in Google Scholar

[55] Bastian Steudel and Nihat Ay. Information-theoretic inference of common ancestors. Entropy, 17(4):2304–2327, 2015.10.3390/e17042304Search in Google Scholar

[56] Martin J. Wainwright and Michael I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1–2):1–305, 2007.10.1561/2200000001Search in Google Scholar

[57] Robert M Wald. General relativity. University of Chicago press, 2010.Search in Google Scholar

[58] Mirjam Weilenmann and Roger Colbeck. Non-Shannon inequalities in the entropy vector approach to causal structures. arXiv:1605.02078, 2016.Search in Google Scholar

[59] Nanny Wermuth et al. Probability distributions with summary graph structure. Bernoulli, 17(3):845–879, 2011.10.3150/10-BEJ309Search in Google Scholar

[60] Elie Wolfe, Robert W Spekkens, and Tobias Fritz. The inflation technique for causal inference with latent variables. Journal of Causal Inference, 7(2), 2019.10.1515/jci-2017-0020Search in Google Scholar

[61] Christopher J. Wood and Robert W. Spekkens. The lesson of causal discovery algorithms for quantum correlations: Causal explanations of Bell-inequality violations require fine-tuning. New J. Phys, 17(3):033002, mar 2012.10.1088/1367-2630/17/3/033002Search in Google Scholar

[62] Jing Yu, V. Anne Smith, Paul P. Wang, Alexander J. Hartemink, and Erich D. Jarvis. Advances to bayesian network inference for generating causal networks from observational biological data. Bioinformatics, 20(18):3594–3603, 2004.10.1093/bioinformatics/bth448Search in Google Scholar PubMed

[63] Jiji Zhang. Causal reasoning with ancestral graphs. Journal of Machine Learning Research, 9(Jul):1437–1474, 2008.Search in Google Scholar

A Simplifying Causal Structures

A.1 Observational Equivalence

From an experimental perspective, a causal model (𝓖, 𝓟) has the ability to predict the effects of interventions; by manually tinkering with the configuration of a system, one can learn more about the underlying mechanisms than from observations alone [43]. When interventions become impossible, because experimentation is expensive or unethical for example, it becomes possible for distinct causal structures to admit the same set of compatible correlations. An important topic in the study of causal inference is the identification of observationally equivalent causal structures. Two causal structures 𝓖 and 𝓖′ are observationally equivalent or simply equivalent if they share the same set of compatible models 𝓜𝓥(𝓖) = 𝓜𝓥(𝓖′). For example, the direct cause causal structure in Figure 17a is observationally equivalent to the common cause causal structure in Figure 17b. Identifying observationally equivalent causal structures is of fundamental importance to the causal compatibility problem; if a distribution P𝓥 is known to satisfy the hypotheses of 𝓖, and 𝓜𝓥(𝓖) = 𝓜𝓥(𝓖′) then it will also satisfy the hypotheses of 𝓖′.

Figure 17 The causal structures of (a) and (b) are observationally equivalent.
Figure 17

The causal structures of (a) and (b) are observationally equivalent.

A.2 Exo-Simplicial Causal Structures

In general, other than being a directed acyclic graph, there are no restrictions placed on a causal structure with latent variables. Nonetheless, [22] demonstrated a number of transformations on causal structures which leave 𝓜𝓥(𝓖) invariant. Two of these transformations are the subject of interest for this section. The first concerns itself with latent vertices that have parents while the second concerns itself with parent-less latent vertices that share children. Each will be taken in turn.

Definition 12

(See Defn. 3.6 [22]). Given a causal structure 𝓖 = (𝓥 ∪ 𝓛, 𝓔) with latent vertex ∈ 𝓛, the exogenized causal structure exo𝓖() is formed by taking 𝓔 and (i) adding an edge pc for every p ∈ pa𝓖() and c ∈ ch𝓖() if not already present, and (ii) deleting all edges of the form p where p ∈ pa𝓖(). If pa𝓖() is empty, exo𝓖() = 𝓖.

Lemma 6

(See Lem. 3.7 [22]). Given a causal structure 𝓖 = (𝓥 ∪ 𝓛, 𝓔) with latent vertex ∈ 𝓛, then 𝓜𝓥(exo𝓖()) = 𝓜𝓥(𝓖).

Proof

See proof of Lem. 3.7 from [22].□

The concept of exogenization is best understood with an example.

Example 1

Consider the causal structure 𝓖18a in Figure 18a. In 𝓖18a, the latent variable has parents pa() = {v1, v2, v3} and children ch() = {v4, v5}. Since the sample space Ω is unknown, its cardinality could be arbitrarily large or infinite. As a result, it has an unbounded capacity to inform its children of the valuations of its parents, e.g. v4 can have complete knowledge of v1 through and therefore adding the edge v1v4 has no observational impact. Applying similar reasoning to all parents of , i.e. applying Lemma 6, one converts 𝓖18a to the observationally equivalent, exogenized causal structure exo𝓖18a() depicted in Figure 19.

Figure 18 Examples of causal structures which are not exo-simplicial.
Figure 18

Examples of causal structures which are not exo-simplicial.

Figure 19 The exogenized causal structure exo𝓖18a(ℓ).
Figure 19

The exogenized causal structure exo𝓖18a().

Lemma 6 can be applied recursively to each latent variable ∈ 𝓛 in order to transform any causal structure 𝓖 into an observationally equivalent one wherein the latent variables have no parents (exogenous). Notice that the process of exogenization also works when latent vertices have latent parents, as is the case in Figure 18b. Also, when a latent vertex has no children, the process of exogenization disconnects from the rest of the causal structure, where it can be ignored with no observational impact due to Equation 7.

The next observationally invariant transformation requires the exogenization procedure to have been applied first. In Figure 18d, 1 and 2 are exogenous latent variables where ch𝓖18d(2) ⊂ ch𝓖18d(1). Therefore, because the sample space Ω1 is unspecified, it has the capacity to emulate any dependence that v3 and/or v2 might have on 2. This idea is captured by Lemma 7.

Lemma 7

(See Lem. 3.8 [22]). Let 𝓖 be a causal structure with latent vertices, ′ ∈ 𝓛 where′. If pa𝓖() = pa𝓖(′) = ∅, and ch𝓖(′) ⊆ ch𝓖() then 𝓜𝓥(𝓖) = 𝓜𝓥(sub(𝓖)𝓥 ∪ 𝓛 – {′}).

Proof

See proof of Lem. 3.8 from [22].□

An immediate corollary of Lemma 7 is that the latent variables { | ∈ 𝓛}, which are isomorphic to their children {ch() | ∈ 𝓛}, are isomorphic to the facets of a simplicial complex over the visible variables.

Definition 13

An (abstract) simplicial complex, Δ, over a finite set 𝓥 is a collection of non-empty subsets of 𝓥 such that:

  1. {v} ∈ Δ for all v ∈ 𝓥; and

  2. if C1C2 ⊆ 𝓥, C2ΔC1Δ.

The maximal subsets with respect to inclusion are called the facets of the simplicial complex.

In [22], this concept led to the invention of mDAGs (or marginal directed acyclic graphs), a hybrid between a directed acyclic graph and a simplicial complex. In this work, we refrain from adopting the formalism of mDAGs and instead continue to consider causal structures as entirely directed acyclic graphs. Despite this refrain, Lemmas 6, 7 demonstrate that for the purposes of the causal compatibility problem, the latent variables of a causal structure can be assumed to be exogenous and to have children forming the facets of a simplicial complex. Causal structures which adhere to this characterization will be referred to as exo-simplicial causal structures. Figure 20 depicts four exo-simplicial causal structures respectively equivalent to the causal structures in Figure 18.

Figure 20 Examples of exo-simplicial causal structures which are observationally equivalent to their respective counterparts in Figure 18.
Figure 20

Examples of exo-simplicial causal structures which are observationally equivalent to their respective counterparts in Figure 18.

B Simplifying Causal Parameters

Recall that a causal model (𝓖, 𝓟) consists of a causal structure 𝓖 and causal parameters 𝓟. Appendix A simplified the causal compatibility problem by revealing that each causal structure 𝓖 can be replaced with an observationally equivalent exo-simplicial causal structure 𝓖′ such that 𝓜𝓥(𝓖) = 𝓜𝓥(𝓖′). The purpose of this section is to simplify the causal compatibility problem in three ways. Section B.1 demonstrates that the visible causal parameters {Pv|pa(v) | v ∈ 𝓥} of a causal model can be assumed to be deterministic without observational impact. Section B.2 shows that if the observed distribution is finite (i.e. |Ω𝓥| < ∞), one only needs to consider finite probability distributions for the latent variables. Moreover, explicit upper bounds on the cardinalities of the latent variables can be computed.

B.1 Determinism

Lemma 8

If P𝓥 ∈ 𝓜𝓥(𝓖) and 𝓖 is exo-simplicial (see Appendix A), then without loss of generality, the causal parameters Pv|pa𝓖(v)over the observed variables can be assumed to be deterministic, and consequently,

xVΩV,PV(xV)=LλΩdP(λ)vLδ(xv,fv(xvpaG(v),λlpaG(v)))(56)

Proof

Since P𝓥 ∈ 𝓜𝓥(𝓖), by definition, there exists a joint distribution P𝓥∪𝓛 (or density dP𝓥∪𝓛) admitting marginal P𝓥 via Equation 7. Since the joint distribution satisfies Equation 6, it is possible to associate to each observed variable Xv an independent random variable Eev and measurable function fv : Ωvpa𝓖(v) × Ωlpa𝓖(v) × Ωev such that for all v ∈ 𝓥,

Xv=fvXvpaG(v),ΛlpaG(v),Eev.(57)

Therefore, by promoting each ev to the status of a latent variable in 𝓖 and adding an edge evv to 𝓔, each Xv becomes a deterministic function of its parents. Finally, making use of the fact that 𝓖 is exo-simplicial, every error variable ev has its children ch𝓖(ev) = {v} nested inside the children of at least one other pre-existing latent variable. Therefore, by applying Lemma 7, ev is eliminated and one recovers the original 𝓖.□

Essentially, Lemma 8 indicates that any non-determinism due to local noise variables Eev can be emulated by the behavior of the latent variables 𝓛.

B.2 The Finite Bound for Latent Cardinalities

In [50], it was shown that if the visible variables have finite cardinality (i.e. k𝓥 = |Ω𝓥| is finite), then for a particular class of causal structures known as causal networks, the cardinalities of the latent variables could be assumed to be finite as well. A causal network is a causal structure where all latent variables have no parents (are exogenous) and all visible variables either have no parents or no children [37]. The purpose of this section is to generalize the results of [50] to the case of exo-simplicial causal structures. Although the proof techniques presented here are similar to that of [50], the best upper bounds placed on k𝓛 = |Ω𝓛| depends more intimately on the form of 𝓖. It is also anticipated that the upper bounds presented here are sub-optimal, much like [50]. It is also worth noting that the results presented here hold independently of whether or not Lemma 8 is applied.

Theorem 9

Let (𝓖, 𝓟) be a causal model with (possibly infinite) cardinalitiesk𝓛 = {k | ∈ 𝓛} for the latent variables such that,

xVΩV,PV(xV)=LλΩdP(λ)vVPv|pa(v)(xv|xvpa(v)λlpa(v)),(58)

produces the distribution P𝓥. Then there exists a causal model (𝓖, 𝓟′) reproducing P𝓥with cardinalitiesk𝓛 = {k | ∈ 𝓛} where eachkis a finite.

Proof

The following proof considers each latent variable ξ ∈ 𝓛 independently and obtains a value for k in each case. Let 𝓛′ = 𝓛 – {ξ} denote the set of latent variables with ξ removed. Let dP𝓛′ = ∏∈𝓛′ dP be a probability density over Ω𝓛′ and consider the conditional probability distribution P𝓥|ξ(x𝓥|λξ) given λξ,

PV|ξ(xV|λξ)=ΩLdPL(λL)vVPv|pa(v)(xv|xvpa(v)λlpa(v))(59)

Consulting Figure 21 for clarity, define the districtD ⊆ 𝓥 of ξ to be the maximal set of visible vertices v in 𝓖 for which there exists an undirected path from v to ξ with alternating visible/latent vertices. Let Dc = 𝓥 – D, = pa(D) – D and c = pa(Dc) – Dc. The district D has the property that P𝓥|ξ factorizes over D, Dc [22],

PV|ξ(xV|λξ)=PD|D¯ξ(xD|xD¯λξ)PDc|D¯c(xDc|xD¯c).(60)
Figure 21 A causal structure 𝓖21 that helps in visualizing the proof of Theorem 9.
Figure 21

A causal structure 𝓖21 that helps in visualizing the proof of Theorem 9.

For varying λξ, consider a vector representation pλξ of the conditional distribution PD|D̄ξ(xD|xλξ) and define U = {pλξ | λξΩξ}. By construction, the center of mass p* of U represents PD|(xD | x),

p=ΩξdPξ(λξ)pλξ(61)
PD|D¯(xD|xD¯)=ΩξdPξ(λξ)PD|D¯ξ(xD|xD¯λξ(62)

Therefore, by a variant of Carathéodory’s theorem due to Fenchel [5], if U is compact and connected, then p* can be written as a finite convex decomposition,

p=j=1aff(U)wjpj,jwj=1,i,wi0.(63)

where aff(U) is the affine dimension of U. Then by letting Ωξ = {0ξ, 1ξ, …, aff(U)ξ} be a finite sample space for ξ distributed according to Pξ(λξ) = wλ, by Equations 58, 59, 60 and 62,

PV(xV)=λξΩξPξ(λξ)PV|ξ(xV|λξ).(64)

Therefore, causal parameters exist reproducing P𝓥 with cardinality kξ = aff(U). What remains is to show that U is compact and to find a bound on aff(U).

Because of normalization constraints on each pλξ, U is bounded. Moreover, [50] demonstrates that U can be taken to be closed as well. Again consulting Figure 21 for clarity, partition D into subsets A = des(ξ) ∩ D and B = DA. This partitioning enables one to identify the following linear equality constraint placed on all points pλξ:

xAΩAPD|D¯ξ(xD|xD¯λξ)(65)
=xAΩAPA|BD¯ξ(xA|xBxD¯λξ)PB|D¯ξ(xB|xD¯λξ)(66)
=PB|D¯ξ(xB|xD¯λξ)(67)
=PB|D¯(xB|xD¯),(68)

where the last equality holds because B is independent of ξ given [28]. Furthermore note that if U is not connected, it can be made connected by a scheme due to [50] which adds noisy variants of each pλξ to U. Simply include a noise parameter ν ∈ [0, 1] such that λξ=λξ,ν and adjust the response functions for variables in A such that,

PA|BD¯ξ(xA|xBxD¯λξν)=νPA|BD¯ξ(xA|xBxD¯λξ)+1ν|ΩA|(69)

For each degree of noise 0 ≤ ν ≤ 1, Equation 69 defines a noisy model pλξ,ν which are added to U. As special cases, no noise ν = 0, yields pλξ,0 = pλξU and complete noise ν = 1 yields pλξ,1 representing PB|(xB|x)/|ΩA| ∈ U which is independent of λξ. Therefore, U is connected. Finally, the affine dimension aff(U) is at most the affine dimension of PD| with the degrees of freedom associated with satisfying Equation 68 removed [50]. Therefore,

kξ=aff(U)aff(PD|D¯)aff(PB|D¯)(70)

C Proof of Theorem 3

Proof

The proof first constructs the distribution P̃𝓥 which satisfies the error bound in Equation 54. Afterwards, a uniform functional model (𝓖, 𝓕̃𝓥, 𝓟̃𝓛) is constructed which produces P̃𝓥. Begin by letting P̃ denote the rational approximation of P for each ∈ 𝓛 as prescribed by Theorem 2. Then, let

PL(λL)=LP(λ),P~L(λL)=LP~(λ).(71)

The joint distribution P𝓥 and the rational approximation P̃𝓥 are then given by,

PV(xV)=λLΩLPL(λL)δ(xV,FV(λL)),(72)
P~V(xV)=λLΩLP~L(λL)δ(xV,FV(λL)).(73)

The distance Δ(P𝓥, P̃𝓥) between the visible joint distributions is no greater than the distance Δ(P𝓛, P̃𝓛) between the latent joint distributions:

Δ(PV,P~V)=xVΩV|PV(xV)P~V(xV)|(74)
=xVΩV|λLΩL{PL(λL)P~L(λL)}δ(xV,FV(λL))|(75)
λLΩLxVΩV|PL(λL)P~L(λL)|δ(xV,FV(λL))(76)
=λLΩL|PL(λL)P~L(λL)|(77)
=Δ(PL,P~L)(78)

The bound in Equation 54 will be derived using Equation 48. For convenience of notation, let the latent variables be indexed 𝓛 = {1, 2, …, L} and let 𝓛′ = {u1, u2, …, uL} index the corresponding uniformly distributed variables as defined in Theorem 2. Then,

Δ(PL,P~L)(79)
=λLΩL|PL(λL)P~L(λL)|(80)
=λLΩLj=1LPj(λj)j=1LP~j(λj)(81)
=λLΩLj=1LP~j(λj)+ε(λj)|Ωuj|j=1LP~j(λj)(82)

Here it becomes advantageous to define helper variables Γ0,j and Γ1,j such that,

Γ0,j(λL)=P~j(λj),Γ1,j(λL)=ε(λj)|Ωuj|.(83)

Additionally, let b ∈ {0, 1}L be a binary string of length L. Then Equation 82 becomes,

Δ(PL,P~L)(84)
=λLΩLj=1L(Γ0,j(λL)+Γ1,j(λL))j=1LΓ0,j(λL)(85)
=λLΩLb=12L1j=1LΓbj,j(λL)(86)
λLΩLb=12L1j=1LΓbj,j(λL)(87)

Summing over Γ0,j yields 1 due to normalization of P̃j(λj) in Equation 83. However, summing over Γ0,j yields (|Ωj| – 1)/|Ωuj| exactly as in Theorem 2. Therefore,

Δ(PL,P~L)k1=1L|Ωk1|1|Ωuk1|+12!k1=1Lk2=1L|Ωk1|1|Ωk2|1|Ωuk1||Ωuk2|+(88)

In order to simplify Equation 88, let C, K be defined as,

C=max|Ωj|1jL,K=min|Ωuj|1jL.(89)

Combining Equations 78, 88, and 89, one obtains the required result,

Δ(PV,P~V)n=1L1n!L(C1)Kn(90)

To conclude the proof, one needs to prove the existence of a uniform functional model (𝓖, 𝓕̃𝓥, 𝓟̃𝓛) which reproduces P̃𝓥. To do so, substitute into Equation 73 the functional form of the rational approximations (Equation 46) from Theorem 2 for each j ∈ 𝓛,

P~V(xV)=j1LλjΩj1|Ωuj|ωujΩujδ(λj,gj(ωuj))δ(xV,FV(λ1λ2λL)).(91)

Perform the sum over all latent valuations to remove the inner delta function,

P~V(xV)=j1L1|Ωuj|ωujΩujδ(xV,FV(g1(ωu1)g2(ωu2)gL(ωuL))).(92)

Finally, one can recursively define the functions in 𝓕̃𝓥 to be such that 𝓕̃𝓥(ω𝓛′) = 𝓕𝓥(g(ω𝓛′)) and consequently Equation 92 defines the uniform functional model (𝓖, 𝓕̃𝓥, 𝓟̃𝓛) which reproduces P̃𝓥.□

Received: 2019-05-20
Accepted: 2020-03-10
Published Online: 2020-07-25

© 2020 T. C. Fraser, published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 27.5.2024 from https://www.degruyter.com/document/doi/10.1515/jci-2019-0013/html
Scroll to top button