A note on efficient minimum cost adjustment sets in causal graphical models

Ezequiel Smucler; Andrea Rotnitzky

doi:10.1515/jci-2022-0015

Open Access Published by De Gruyter July 14, 2022

A note on efficient minimum cost adjustment sets in causal graphical models

Ezequiel Smucler and Andrea Rotnitzky

From the journal Journal of Causal Inference

https://doi.org/10.1515/jci-2022-0015

Abstract

We study the selection of adjustment sets for estimating the interventional mean under an individualized treatment rule. We assume a non-parametric causal graphical model with, possibly, hidden variables and at least one adjustment set composed of observable variables. Moreover, we assume that observable variables have positive costs associated with them. We define the cost of an observable adjustment set as the sum of the costs of the variables that comprise it. We show that in this setting there exist adjustment sets that are minimum cost optimal, in the sense that they yield non-parametric estimators of the interventional mean with the smallest asymptotic variance among those that control for observable adjustment sets that have minimum cost. Our results are based on the construction of a special flow network associated with the original causal graph. We show that a minimum cost optimal adjustment set can be found by computing a maximum flow on the network, and then finding the set of vertices that are reachable from the source by augmenting paths. The optimaladj Python package implements the algorithms introduced in this article.

Keywords: adjustment set; back-door criterion; causal inference; semiparametric efficiency

MSC 2010: 62G05; 62D20

1 Introduction

This article contributes to the growing literature on graphical criteria for selecting adjustment sets that suffice to control for confounding. In a causal graphical model, a set of covariates Z is an adjustment set for the effect of a, possibly individualized, point exposure treatment rule on an outcome, if the interventional mean, i.e., the mean of the outcome in the hypothetical world in which all units in the population receive a given treatment rule, is identified by the g-formula [1] that adjusts for Z . An individualized point exposure treatment rule is a rule for assigning subjects to treatment at a single time point that depends on the subjects’ covariates. This article contributes to the existing literature by incorporating the possibility that variables in the graph are associated with different costs and exploring the existence of graphical criteria for determining efficient adjustment sets under different scenarios on the budget constraint.

The literature on graphical selection of adjustment sets is particularly relevant for the task of deciding, at the stage of the design of an observational study, which variables to measure in order to control for confounding. It is well known that, under a causal graphical model, there may exist many adjustment sets, and a series of recent articles derived complete and sound graphical criteria for determining them [2,3, 4,5]. A subsequent thread of articles provided graphical rules for comparing adjustment sets in causal graphical models based on efficiency criteria. Specifically, assuming a linear causal graphical model with no hidden variables and treatment effects estimated by ordinary least squares, previous studies [6] and [3] provided criteria for comparing certain pairs of adjustment sets. Ref. [7] derived a graphical characterization of the globally most efficient adjustment set and extended the criteria of Kuroki, Cai, and Miyakawa. See also ref. [8]. Ref. [9] extended these results to non-parametric graphical models and non-parametrically adjusted estimators. All of the aforementioned articles considered only static treatment rules, i.e., “one size fits all regimes,” which assign the same treatment to all population units regardless of their covariates. Ref. [10] extended the results of ref. [9] by allowing the possibility of both individualized treatment rules and graphical models that have hidden variables. The authors also provided graphical criteria for determining optimal adjustment sets both among minimal adjustment sets and among minimum cardinality adjustment sets. Minimal adjustment sets are valid adjustment sets such that the removal of any variable from them destroys their validity. Moreover, ref. [10] provided a sufficient criterion for the existence of a globally optimal adjustment set. Ref. [11] provided a necessary and sufficient criterion for the existence of a globally optimal adjustment set in linear causal graphical models. Assuming a non-graphical online setting in which the investigator can alter the data collection mechanism adaptively, ref. [12] proposed an estimator of an optimal identifying functional among those in a predefined set.

In most realistic settings the costs associated with measuring different covariates can vary considerably. For instance, variables requiring the laboratory assessment of blood samples are usually much more expensive to measure than variables that can be obtained by clinical examination. The latter, in turn, are more expensive than those obtained from surveys. Cost considerations then give rise to the important practical problem of determining the adjustment set that yields estimators of treatment effects with minimum variance among the adjustment sets whose overall cost meets a given budget constraint. To the best of our knowledge, the literature on causal graphical models has not addressed this problem yet. The present article fills this gap. In fact, we show by means of an example that such an ideal adjustment set does not always exist, except when the variance minimization problem is restricted to observable adjustment sets with lowest overall cost. We derive a graphical criterion and a polynomial time algorithm for computing a solution to the latter optimization problem. We refer to adjustment sets that solve this problem as optimal minimum cost adjustment sets. Our results are based on building a special flow network associated with the original causal graph. We show that optimal minimum cost adjustment sets can be found by computing a maximum flow on the network, and then finding the set of vertices that are reachable from the source by augmenting paths. Flow networks had already been proposed in refs [13] and [14] as a tool to compute minimal and minimum cost adjustment sets, but with no consideration for statistical efficiency. The python package optimaladj available at https://pypi.org/project/optimaladj implements the algorithms introduced in this article.

The rest of the article is organized as follows. In Section 2, we review some necessary background on graph theory and semiparametric estimation. In Section 3, we present the non-parametric causal graphical model we will assume throughout the article. In Section 4, we provide a graphical characterization of minimum cost adjustment sets as a class of vertex separators in the undirected graph introduced in ref. [10]. In Section 5, we provide a graphical criterion for efficiency comparisons of minimum cost adjustment sets, based on the aforementioned undirected graph. Section 6 contains the main results of this article. In it, we construct a special flow network and show how min-cuts in it are related to minimum cost adjustment sets. We also show that an optimal minimum cost adjustment set always exists and provide a polynomial time algorithm to compute it. Finally, we illustrate our results in a few examples and show that the optimal budget constrained adjustment set discussed earlier does not exist in general.

2 Background

2.1 Undirected graphs

An undirected graph ℋ = ( V , E ) is formed by a finite vertex set V and a set of undirected edges E . A weighted undirected graph is an undirected graph together with a cost function c : V → ( 0 , + ∞ ) . The cost of a set of vertices Z is defined as the sum of the costs of the vertices that comprise Z . In a slight abuse of notation we write c ( Z ) for the cost of Z .

If U − W is an edge in ℋ , then we say that U and W are adjacent. A path between U and W is a sequence of adjacent vertices ( V 1 , … , V j ) such that V 1 = U and V j = W . Two vertices U and W are connected in ℋ if there exists a path from U to W in ℋ .

For Z 1 , Z 2 , and Z 3 disjoint sets of vertices in ℋ , we write Z 1 ⊥ ℋ Z 2 ∣ Z 3 if every path in ℋ between Z 1 and Z 2 intersects Z 3 . Given two vertices A and Y in ℋ , a set of vertices Z disjoint with A and Y is an A − Y separator if A ⊥ ℋ Y ∣ Z . The set Z is a minimal A − Y separator if it is an A − Y separator and no proper subset of Z is an A − Y separator. If ℋ is a weighted undirected graph with weight function c , the set Z is a minimum cost A − Y separator if it is an A − Y separator that satisfies c ( Z ) ≤ c ( Z ′ ) for any other A − Y separator Z ′ .

2.2 Directed graphs

A directed graph G = ( V , E ) is formed by a finite vertex set V and a set of directed edges E ⊂ V × V . Given Z ⊂ V the induced subgraph G Z = ( Z , E Z ) is defined as the graph obtained by considering only vertices in Z and edges between vertices in Z .

Two vertices U and W are adjacent if there is an edge between them. A path between U and W in G is a sequence of adjacent vertices ( V 1 , … , V j ) such that V 1 = U and V j = W . The path is directed (or causal) if V i → V i + 1 for all i ∈ { 1 , … , j − 1 } .

If U → W , then U is a parent of W . If there is a directed path from U to W , then U is an ancestor of W and W a descendant of U . We follow the convention that every vertex is an ancestor and a descendant of itself. The sets of parents, ancestors, and descendants of W in G are denoted by pa G ( W ) , an G ( W ) , and de G ( W ) , respectively. The set of non-descendants of W is defined as nd G ( W ) ≡ V ⧹ de G ( W ) . For a set of vertices Z we define an G ( Z ) = ∪ W ∈ Z an G ( W ) and de G ( Z ) = ∪ W ∈ Z de G ( W ) .

A directed cycle is a directed path that begins and ends at the same vertex. A directed acyclic graph is a directed graph that does not have directed cycles.

Let G be a directed acyclic graph and let A and Y be vertices in G . Let cn ( A , Y , G ) be the set of vertices that lie on a directed path between A and Y and are not equal to A . Let forb ( A , Y , G ) ≡ de G ( cn ( A , Y , G ) ) ∪ { A } . We call this the set of forbidden vertices with respect to A , Y in G .

The proper back-door graph G p b d ( A , Y ) [14] is defined as the graph formed by removing from G the first edge of every directed path from A to Y .

The moral graph G m associated with a directed acyclic graph G is an undirected graph with the same vertex set as G and an edge U − W if any of the following hold in G : U → W , W → U , or there exists a vertex C such that U → C ← W .

2.3 Flow networks

We follow the conventions in ref. [15]. A flow network D = ( V , E , k , s , t ) is a directed graph together with a capacity function k : E → [ 0 , + ∞ ] and two distinguished vertices s and t . s is called the source and t the sink of the network. k ( e ) is called the capacity of edge e . Consider then a flow network D = ( V , E , k , s , t ) . For a vertex W ∈ V , we let α ( W ) be the set of edges that point into W and β ( W ) be the set of edges that point out of W . For a set of vertices S ⊂ V we let S ¯ = V ⧹ S .

A flow is a function f : E → R that satisfies 0 ≤ f ( e ) ≤ k ( e ) for all e ∈ E and ∑ e ∈ α ( W ) f ( e ) = ∑ e ∈ β ( W ) f ( e ) for every vertex W not equal to the source or the sink. The total flow of f is defined as ∑ e ∈ α ( t ) f ( e ) . A flow is called a max-flow if no other flow has a greater total flow.

A cut is a set of vertices that contains the source but not the sink of the network. If S is a set of vertices we define ( S , S ¯ ) as the set of edges in D of the form U → W for some U ∈ S and W ∉ S . If S is a cut we define its capacity as

k ( S ) ≡ ∑ e ∈ ( S , S ¯ ) k ( e ) .

A cut is called a min-cut if no other cut has a smaller capacity.

2.4 Semiparametric estimation

An estimator γ ^ of a parameter γ ( P ) based on n independent identically distributed random copies V 1 , … , V n of V is asymptotically linear at a probability law P if there exists a random variable φ P ( V ) , called the influence function of γ ( P ) , such that E P { φ P ( V ) } = 0 , var P { φ P ( V ) } < ∞ , and n 1 ∕ 2 { γ ^ − γ ( P ) } = n − 1 ∕ 2 ∑ i = 1 n φ P ( V i ) + o p ( 1 ) under P . The central limit theorem implies that if γ ^ is asymptotically linear, n 1 ∕ 2 { γ ^ − γ ( P ) } converges in distribution to a zero mean normal distribution with variance var P { φ P ( V i ) } . Given a collection of probability laws P for V , an estimator γ ^ of γ ( P ) is said to be regular at one P if its convergence to γ ( P ) is locally uniform at P in P [16].

3 Causal graphical models

Given a directed acyclic graph G with vertex set V , we identify V with a random vector. The Bayesian network ℳ ( G ) is the collection of laws P for V that satisfy the local Markov property:

W ⊥ ⊥ nd G ( W ) ∣ pa G ( W ) under P for all W ∈ V .

Here A ⊥ ⊥ B ∣ C stands for conditional independence of A and B given C . Throughout, we will assume that the law P of V admits a density f with respect to some dominating measure. Then the local Markov property implies [2]

(1) f ( v ) = ∏ V j ∈ V f { v j ∣ pa G ( v j ) } ,

where pa G ( v j ) is the value taken by pa G ( V j ) when V takes the value v .

In this article, we will assume an agnostic causal graphical model [17,18] represented by a directed acyclic graph G . This model identifies the vertex set of G with a factual random vector V and assumes that: (i) the law P of V satisfies P ∈ ℳ ( G ) and (ii) for any A ∈ V , L ⊂ nd G ( A ) and π ( A ∣ L ) a conditional law for A given L , the intervention density f π ( v ) of the variables in G when, possibly contrary to fact, the value of A is drawn from the law π ( A ∣ L ) is given by

(2) f π ( v ) = π ( a ∣ l ) ∏ V j ∈ V ⧹ { A } f { v j ∣ pa G ( v j ) } ,

where a and l are the values taken by A and L when V takes the value v , and throughout we assume that A is a finite-valued variable and f { A = a ∣ pa G ( A ) } > 0 with probability 1 under P for all a . Formula (2) is known as the g-formula [1]. The conditional law π designates a, possibly random and individualized, treatment rule. Individualized treatment rules that depend on L are also referred to as point exposure dynamic treatment regimes or L -dependent regimes in the literature. A non-random individualized treatment rule that sets A = d ( L ) corresponds to the point mass conditional law π ( a ∣ l ) = I d ( l ) ( a ) . In particular, a constant function d ( L ) = a corresponds to a static intervention that sets A = a . Throughout the article, let Y and A be the outcome and treatment of interest, respectively. Let χ π ( P ; G ) be the mean of Y under f π . We refer to χ π ( P ; G ) as the interventional mean under treatment rule π . By factorizations (1) and (2), the Radon–Nikodym theorem gives

(3) χ π ( P ; G ) = E P π ( A ∣ L ) f { A ∣ pa G ( A ) } Y .

Furthermore, the local Markov property implies that

(4) χ π ( P ; G ) = E P ( E π ∗ [ E P { Y ∣ A , pa G ( A ) , L } ∣ pa G ( A ) , L ] ) ,

where E P ( ⋅ ∣ ⋅ ) stands for the conditional mean under P and E π ∗ ( ⋅ ∣ ⋅ ) stands for the conditional mean under the conditional law of A given L and pa G ( A ) defined as π ∗ { A ∣ pa G ( A ) , L } ≡ π ( A ∣ L ) . In the special case in which the regime is deterministic and static setting A to a , the right hand side of equation (3) is equal to inverse probability weighted expression for the interventional mean E P [ f − 1 { A = a ∣ pa G ( A ) } I a ( A ) Y ] , and the right hand side of equation (4) is equal to the standardized mean expression for the interventional mean E P { E P [ Y ∣ A = a , pa G ( A ) ] } .

We are interested in conducting inference about χ π ( P ; G ) when only a subset N of V is observable. The inferential problem is thus defined by the following assumptions: (i) P ∈ ℳ ( G ) , (ii) the available data consists of a random sample from the marginal law of N under P , (iii) the parameter of interest is χ π ( P ; G ) , and (iv) at least one observable adjustment set exists. Adjustment sets will be formally defined in the following section. Throughout we will assume (i) Y ∈ de G ( A ) , (ii) { A , Y } ∪ L ⊂ N , (iii) L ⊂ nd G ( A ) , (iv) A takes values in a finite set, and (v) f { A = a ∣ pa G ( A ) } > 0 with probability 1 under P for all a . Moreover, we assume that each observable variable W ∈ N has an associated positive cost c ( W ) that could represent, for example, the cost of measuring W .

4 Minimum cost adjustment sets and their graphical characterization

Ref. [10] gave the following definitions of dynamic adjustment sets and minimal dynamic adjustment sets in graphs with hidden variables. The dynamic appellative here refers to the, possibly, dynamic nature of the interventions, meaning that the interventions can depend on the value of the variables in L .

Definition 1

A set Z ⊂ V \ { A , Y } is an L – N dynamic adjustment set with respect to A , Y in G if L ⊂ Z ⊂ N and for all conditional laws π ( A ∣ L ) for A given L , all P ∈ ℳ ( G ) , and all y ∈ R

E P ( E π ∗ [ E P { I ( − ∞ , y ] ( Y ) ∣ A , pa G ( A ) , L } ∣ pa G ( A ) , L ] ) = E P ( E π Z ∗ [ E P { I ( − ∞ , y ] ( Y ) ∣ A , Z } ∣ Z ] ) ,

where I ( − ∞ , y ] ( Y ) is the indicator variable of the event { Y ≤ y } , π Z ∗ ( A ∣ Z ) ≡ π ( A ∣ L ) and, recall, π ∗ { A ∣ pa G ( A ) , L } ≡ π ( A ∣ L ) .

An L – N dynamic adjustment set Z is minimal if no proper subset of Z is an L – N dynamic adjustment set.

This extends the definition of refs [4] and [19] to accommodate, possibly random, L -dependent treatment rules and graphs with hidden variables. Ref. [10] also provided a characterization of minimal L – N dynamic adjustment sets as minimal A − Y separators in a suitably constructed undirected graph. We will review this characterization in Section 4.1. For conciseness, in what follows we drop the dynamic appellative and simply write L – N adjustment sets. Also, all L – N adjustment sets are with respect to A , Y in G .

We define the cost of an L – N adjustment set Z as ∑ W ∈ Z c ( W ) , and in a slight abuse of notation we denote this cost with c ( Z ) .

Definition 2

A set Z is a minimum cost L – N adjustment set if it is an L – N adjustment set that satisfies c ( Z ) ≤ c ( Z ′ ) for all L – N adjustment sets Z ′ .

Consider the design of a study aimed at estimating the interventional mean under an L -dependent treatment rule π . Suppose that the investigator has postulated a causal graphical model to this end, and that due to practical or ethical reasons, she can only observe a subset N of the variables in the graph G . Suppose further that G includes at least one L – N adjustment set Z . This implies that A , Y , and Z suffice to identify the interventional mean χ π ( P ; G ) with the so-called g-functional [1]

χ π , Z ( P ; G ) ≡ E P [ E π Z ∗ { E P ( Y ∣ A , Z ) ∣ Z } ] ,

which can then be estimated non-parametrically as further explained in Section 5. For economic reasons, the investigator may then choose to use a minimum cost L – N adjustment set. If several minimum cost L – N adjustment sets exist, then, as we further discuss in Section 5.1, a reasonable criterion for comparing them is using the variance of the limiting distribution of the resulting non-parametric estimators of the g-functional.

Our goals in this article are to:

Provide a graphical characterization of minimum cost L – N adjustment sets.
Prove the existence of an optimal minimum cost L – N adjustment set that yields non-parametric estimators of the interventional mean with the smallest asymptotic variance among those that control for minimum cost L – N adjustments and provide a polynomial time graphical algorithm to compute it.

To achieve this, we will leverage the undirected graph defined in ref. [10], which we review next. In what follows, we will assume that there exists at least one L – N adjustment set in G .

4.1 Minimum cost adjustment sets and undirected graphs

Let ℋ 0 ≡ { G an G ( { A , Y } ∪ L ) p b d ( A , Y ) } m and ignore ≡ { an G ( { A , Y } ∪ L ) ⧹ { A , Y } } ∩ { [ V ⧹ N ] ∪ forb ( A , Y , G ) } . Thus, ignore is the subset of the vertices ℋ 0 that are not equal to A or Y and that are either not observable ( V ⧹ N ) or are variables that cannot be members of any L – N adjustment set ( forb ( A , Y , G ) , see ref. [4]).

Definition 3

The non-parametric adjustment efficiency graph associated with ( A , Y , L , N ) in G , denoted with ℋ 1 , is the undirected graph constructed from ℋ 0 by (1) removing all vertices in ignore , (2) adding an edge between any pair of remaining vertices if they were connected in ℋ 0 by a path with vertices in ignore , and (3) adding an edge between A and each vertex in L and between Y and each vertex in L .

In words, ℋ 1 is obtained from ℋ 0 by first performing a latent projection on V ⧹ ignore and then connecting all vertices in L to both A and Y . Similar constructions were also used in refs [20] and [21]. Note that even though ℋ 0 and ℋ 1 depend on A , Y , L , N , and G , for brevity we omit this dependence in the notation.

Proposition 2 of ref. [10] states that Z is a minimal L – N adjustment set if and only if Z is a minimal A − Y separator in ℋ 1 . Note that ref. [10] uses the term cut to refer to what we here call a separator. Both terms are used in the literature, and in this article, we prefer to reserve the name cut for the concept in flow networks. Now, since all minimum cost L – N adjustment sets are minimal L – N adjustment sets, Proposition 2 of ref. [10] implies the following.

Lemma 1

Z is a minimum cost L – N adjustment set if and only if Z is a minimum cost A − Y separator in ℋ 1 .

In what follows, for brevity, all separators in ℋ 1 are between A and Y . In the next section, we review the aspects of the theory of non-parametric estimation of the g-functional χ π , Z ( P ; G ) that are relevant to our derivation of the optimal minimum cost L – N adjustment set.

5 Non-parametric estimation of the g-functional

Ref. [22] showed that estimators of χ π , Z ( P ; G ) that are regular and asymptotically linear at all P in a model P that only makes assumptions on the complexity or smoothness of b ( A , Z ; P ) ≡ E P ( Y ∣ A , Z ) and/or f ( A ∣ Z ) have a unique influence function ψ P , π ( Z ; G ) given by

ψ P , π ( Z ; G ) ≡ π ( A ∣ L ) f ( A ∣ Z ) { Y − b ( A , Z ; P ) } + E π Z ∗ { b ( A , Z ; P ) ∣ Z } − χ π , Z ( P , G ) .

Note that even though ψ P , π is also a function of A and Y , this is not reflected in the notation, for the sake of brevity. In what follows, we will assume that for all L – N adjustment sets Z , it holds that ψ P , π ( Z ; G ) has finite variance.

There exist multiple estimation strategies that rely on making smoothness or complexity-type assumptions on b ( A , Z ; P ) and/or f ( A ∣ Z ) . We list a few of them next. The inverse probability weighted estimator is given by χ ^ π , I P W = P n { f ^ ( A ∣ Z ) − 1 π ( A ∣ L ) Y } , where f ^ ( A ∣ Z ) is a non-parametric estimator of f ( A ∣ Z ) [23]. The outcome regression estimator is given by P n [ E π Z ∗ { b ^ ( A , Z ) ∣ Z } ] , where b ^ is a non-parametric estimator of b [24]. The doubly robust estimator [25,26,27], also known as augmented IPW, uses non-parametric estimators of both f ( A ∣ Z ) and b ( A , Z ) . Examples of non-parametric estimators of f ( A ∣ Z ) and b ( A , Z ) include series or kernel-based estimators, estimators based on boosted trees, and other machine learning techniques.

We will refer to estimators that are regular and asymptotically linear with unique influence function ψ P , π ( Z ; G ) as non-parametric estimators that adjust for Z . It follows from the discussion above that if χ ^ π , Z is a non-parametric estimator that adjusts for Z , then n 1 ∕ 2 { χ ^ π , Z − χ π ( P ; G ) } converges in distribution to N { 0 , σ π , Z 2 ( P ) } , where σ π , Z 2 ( P ) ≡ var P { ψ P , π ( Z ; G ) } .

5.1 Efficiency comparison of minimum cost adjustment sets

Define the following preorder on the class of L – N adjustment sets:

Z 1 ≼ L Z 2 ⇔ σ π , Z 1 2 ( P ) ≤ σ π , Z 2 2 ( P ) for all π ( A ∣ L ) and all P ∈ ℳ(G) .

In words, Z 1 ≼ L Z 2 if adjusting for Z 1 yields more efficient non-parametric estimators of χ π ( P ; G ) than adjusting for Z 2 , uniformly over all possible treatment rules π and laws P in the Bayesian network ℳ(G) . Define the following relation between separators in ℋ 1 :

Z 1 ⊴ ℋ 1 Z 2 ⇔ Y ⊥ ℋ 1 Z 2 ⧹ Z 1 ∣ Z 1 and A ⊥ ℋ 1 Z 1 ⧹ Z 2 ∣ Z 2 .

Since, as stated in Lemma 1, minimum cost L – N adjustment sets and minimum cost separators in ℋ 1 are equivalent, using Lemma 1 and Propositions 3 and 5 of ref. [10], we can deduce the following graphical criterion for comparing minimum cost L – N adjustment sets.

Lemma 2

Let Z 1 and Z 2 be minimum cost L – N adjustment sets. Then

(5) Z 1 ⊴ ℋ 1 Z 2 ⇒ Z 1 ≼ L Z 2 .

We now argue why asymptotic efficiency is a reasonable basis for comparing minimum cost adjustment sets, in the context of designing a planned study where the cost associated with each observable variable in the graph reflects the cost of measuring the variable on one subject. Consider two minimum cost adjustment sets Z 1 and Z 2 , and let χ ^ π , Z 1 and χ ^ π , Z 2 be non-parametric estimators that adjust for Z 1 and Z 2 , respectively. Suppose we know that Z 1 ≼ L Z 2 . If we want the length of the 95% Wald confidence interval for χ π ( P ; G ) to be bounded by M , then using χ ^ π , Z 1 we will need n 1 ≈ { 3.92 σ π , Z 1 ( P ) M − 1 } 2 samples, whereas using χ ^ π , Z 2 we will need n 2 ≈ { 3.92 σ π , Z 2 ( P ) M − 1 } 2 samples. Since σ π , Z 1 2 ( P ) ≤ σ π , Z 2 2 ( P ) we have that n 1 ≤ n 2 and hence n 1 × c ( Z 1 ) ≤ n 2 × c ( Z 2 ) . In words, for the same level of precision, the total cost of using Z 1 as an adjustment set will be lower than that of using Z 2 .

In the following section, we show that there exists a minimum cost L – N adjustment set, which we denote as O c , which satisfies that for any other minimum cost L – N adjustment set Z it holds that O c ≼ L Z . We call O c an optimal minimum cost L – N adjustment set. To show this, we will make a connection between minimum cost L – N adjustment sets and min-cuts in a suitably constructed flow network. This construction will also allow us to derive a polynomial time algorithm to compute O c .

6 Optimal minimum cost adjustment sets and network flows

The following flow network construction is based on well-established ideas from the literature on networks. See Theorem 6.4 of ref. [15]. The main difference is that ref. [15] puts unit capacity on all “internal edges.”

Definition 4

Let the flow network D be defined as follows. For each vertex W in ℋ 1 add two vertices W ′ and W ″ and the edge W ′ → W ″ to D . We call these internal edges. If there is an edge joining U and W in ℋ 1 add edges U ″ → W ′ and W ″ → U ′ . We call these external edges. Thus, an edge U − W in ℋ 1 gives place to the following structure in D :

The capacity of an internal edge e = W ′ → W ″ is equal to the cost of W , that is, k ( e ) = c ( W ) , except if W is equal to A or to Y , in which case the capacity is infinity. The capacity of external edges is infinity. We set Y ″ as the source and A ′ as the sink of the network.

We will provide a full example of this construction in Figure 1. We show in Lemma 4 in the Appendix that there always exists a cut in D with finite capacity. Next, we define two mappings, d and h . The former maps minimal separators in ℋ 1 to sets of vertices in D , while the latter maps cuts with finite capacity in D to sets of vertices in ℋ 1 .

$Figure 1 The flow network D {\mathcal{D}} corresponding to the directed acyclic graph in Figure 2. Edges with finite capacities are colored green, with the numbers next to the edges representing capacities. All black edges have infinite capacity.$

Figure 1

The flow network D corresponding to the directed acyclic graph in Figure 2. Edges with finite capacities are colored green, with the numbers next to the edges representing capacities. All black edges have infinite capacity.

Definition 5

Given Z a minimal separator in ℋ 1 , we let d ( Z ) be the set formed by Y ″ and all vertices in D that lie on some directed path δ from Y ″ to a vertex W ′ for some W ∈ Z , where δ does not intersect any other U ′ or U ″ for U ∈ Z . Given S a cut in D with finite capacity we let

h ( S ) = { W : ( W ′ , W ″ ) ∈ ( S , S ¯ ) } .

The following proposition establishes that d maps minimum cost separators in ℋ 1 to min-cuts in D , and h maps min-cuts in D to minimum cost separators in ℋ 1 .

Proposition 1

Let Z be a minimal separator in ℋ 1 . Then d ( Z ) is a cut in D with k { d ( Z ) } = c ( Z ) .
Let S be a cut in D with finite capacity. Then h ( S ) is a separator in ℋ 1 with c { h ( S ) } = k ( S ) .
Let Z be a minimum cost separator in ℋ 1 . Then d ( Z ) is a min-cut in D .
Let S be a min-cut in D . Then h ( S ) is a minimum cost separator in ℋ 1 .
Let Z be a minimum cost separator in ℋ 1 . Then h { d ( Z ) } = Z .

Next, we establish a connection between the ⊂ relation defined over min-cuts in D and the ⊴ ℋ 1 relation defined over separators in ℋ 1 .

Proposition 2

Let S and S ′ be min-cuts such that S ⊂ S ′ . Then h ( S ) ⊴ ℋ 1 h ( S ′ ) .

Lemmas 1 and 2 together with Propositions 1 and 2 imply that if we are able to construct a min-cut S c that is a subset of any other min-cut, then h ( S c ) is an optimal minimum cost L – N adjustment set. We now show how such a min-cut can be constructed.

Given a flow f , we will say that a path δ connecting Y ″ and W in D is augmenting for f if for all edges e in δ oriented from Y ″ to W it holds that f ( e ) < k ( e ) and for all edges e in δ oriented from W to Y ″ it holds that f ( e ) > 0 . Suppose that we have run a maximum flow algorithm on D , for example, the preflow push algorithm [28], and obtained a maximum flow f ∗ . We are now ready to define our candidate optimal minimum cost L – N adjustment set.

Definition 6

Let S c be the set formed by Y ″ and all vertices W in D such that there exists a path from Y ″ to W that is augmenting for f ∗ . Let O c ≡ h ( S c ) .

Note that S c could in principle depend on the computed max-flow f ∗ , even if this is not made explicit in the notation.

Proposition 3

S c is a min-cut. For any other min-cut S it holds that S c ⊂ S .

The following theorem, the main result of this article, establishes the optimality of O c .

Theorem 1

O c is a minimum cost L – N adjustment set. For any other minimum cost L – N adjustment set Z it holds that O c ≼ L Z .

We do not know if O c is the unique optimal minimum cost L – N adjustment set. Algorithm 1 summarizes the steps needed to compute O c . The complexity of Algorithm 1 will depend on the sub-routine used to compute the maximum flow in the third step. For example, when the preflow push algorithm is used, the overall complexity of Algorithm 1 is bounded by O ( # V 2 # E ) . The fourth step of Algorithm 1 can be easily implemented using a small modification of the depth first search algorithm. We provide a Python implementation of Algorithm 1 in the optimaladj package, available on pip. Our algorithm computes maximum flows using the implementation of the preflow push algorithm available in the networkx library [29].

Algorithm 1: Pseudo-algorithm to compute O c
procedure
construct ℋ 1 construct D compute a maximum flow f ∗ on D compute S c the set of nodes reachable from Y ″ via paths that are augmenting for f ∗ compute h ( S c ) return h ( S c )

When all variables have unit costs, Algorithm 1 computes an L – N adjustment set that is optimal among those of minimum cardinality. Algorithm 1 of ref. [10] does the same thing, but with O ( # V 3.5 ) complexity. Thus, Algorithm 1 also provides an improvement on Algorithm 1 of ref. [10] for the task of computing an optimal minimum cardinality L – N adjustment set.

6.1 Examples

In the following figures we illustrate the results of this section. Dashed circles designate hidden variables and rectangles the variables that the treatment rule depends on. The numbers below the name of each vertex in G and ℋ 1 represent the cost of the variable associated with the vertex. We do not assign costs to A , Y and variables in ignore , since their costs are not relevant for the comparison of minimum-cost L – N adjustment sets. Figure 1 shows the flow network D associated with the graphs in Figure 2. Edges with finite capacities are colored green, with the numbers next to the edges representing capacities. All black edges have infinite capacity.

$Figure 2 A directed acyclic graph G {\mathcal{G}} (a) and the undirected graph ℋ 1 {{\mathcal{ {\mathcal H} }}}^{1} associated with it (b).$

Figure 2

A directed acyclic graph G (a) and the undirected graph ℋ 1 associated with it (b).

For the directed acyclic graph G in Figure 2, let L = { X } and N = V ⧹ { U } . Then an G ( { A , Y } ∪ L ) = V ⧹ { F } , forb ( A , Y , G ) = { A , Y , M } , and ignore = { U , M } . In ℋ 1 , the set of all separators is given by the collection of sets Z that satisfy X ∈ Z and at least one of the following:

K ∈ Z .
B ∈ Z or R ∈ Z , and Q ∈ Z or T ∈ Z .

The only minimum cost separators are Z 1 = { X , Q , R } and Z 2 = { X , T , R } . It is easy to show that Z 2 ⊴ ℋ 1 Z 1 . Thus, Z 2 is an optimal minimum cost L – N adjustment set. Using results from ref. [10], it is easy to show that there exists a globally optimal L – N adjustment set in G , i.e., an L – N adjustment set that is more efficient than any other L – N adjustment set, and that it is given by Z 2 ∪ { F } . Turn now to the representation of D in Figure 1. The min-cut obtained by running the preflow push algorithm and then computing S c is given by S c = { Y ″ , X ′ , T ′ , R ′ } . The capacity of this cut is k ( S c ) = k { ( R ′ , R ″ ) } + k { ( T ′ , T ″ ) } + k { ( X ′ , X ″ ) } = 3 . The optimal minimum cost L – N adjustment set is then O c = h ( S c ) = { X , T , R } , matching what we obtained earlier by analyzing separators in ℋ 1 .

It follows from Theorem 1 of ref. [10] that O c coincides with the optimal L – N adjustment set among minimal L – N adjustment sets. However, it is not always the case that the optimal minimum cost and the optimal minimal L – N adjustment sets are equal. For this same graph, if the cost B were 1 and the cost of R were 2, then the optimal minimal L – N adjustment set would still be equal to { X , R , T } , whereas the optimal minimum cost L – N adjustment set would be { X , B , T } . Since { X , R , T } ⊴ ℋ 1 { X , B , T } , this is an example in which the optimal minimal L – N adjustment set is more efficient that the optimal minimum cost L – N adjustment set. The converse can never happen, because all minimum cost L – N adjustment sets are minimal L – N adjustment sets.

Going back to our original example, note that Z 3 = { X , K } is the L – N adjustment set with minimum possible cardinality. It is easy to check that Z 2 ⊴ ℋ 1 Z 3 and thus in this case, the optimal minimum cost L – N adjustment set is more efficient than the optimal minimum cardinality L – N adjustment set. The graph in Figure 3 provides one example in which the reverse situation holds.

$Figure 3 An example of a directed acyclic graph in which the optimal minimum cardinality L – N L\hspace{0.1em}\text{–}\hspace{0.1em}N adjustment set is more efficient than the optimal minimum cost L – N L\hspace{0.1em}\text{–}\hspace{0.1em}N adjustment set: (a) G {\mathcal{G}} and (b) ℋ 1 {{\mathcal{ {\mathcal H} }}}^{1} .$

Figure 3

An example of a directed acyclic graph in which the optimal minimum cardinality L – N adjustment set is more efficient than the optimal minimum cost L – N adjustment set: (a) G and (b) ℋ 1 .

Indeed, for the graph G in Figure 3, let L = ∅ and N = V . Then an G ( { A , Y } ∪ L ) = V , forb ( A , Y , G ) = { A , Y } , and ignore = ∅ . It is easy to check that there is only one minimum cost separator in ℋ 1 , given by O c = { B , Q } . However, Z = { T , R } is a minimum cardinality separator that satisfies Z ⊴ ℋ 1 O c . Thus, in this case, the optimal minimum cardinality L – N adjustment set is more efficient than the optimal minimum cost L – N adjustment set.

This example also illustrates the point made in the introduction that in general there does not exist an optimal L – N adjustment set among those that satisfy an upper bound on their cost. For the graph in Figure 3, if the available budget is equal to 3, the investigator has to choose between { B , Q } , { B , R } , and { T , Q } . Clearly { B , R } ⊴ ℋ 1 { B , Q } and { T , Q } ⊴ ℋ 1 { B , Q } and so by Propositions 3 and 5 of ref. [10], { B , R } ≼ L { B , Q } and { T , Q } ≼ L { B , Q } . Thus, the investigator actually needs to choose between { B , R } and { T , Q } . However, in their Example 2, ref. [9] showed that it is not possible to compare the asymptotic variances of these two adjustments based solely on the causal graph, because there exist probability laws in the Bayesian network ℳ(G) under which { B , R } is more efficient but also probability laws in the Bayesian network ℳ(G) under which { Q , T } is more efficient.

Moreover, the graph in Figure 3 can be used to show that, in general, the problem of finding an optimal efficient minimum cost adjustment set under arbitrary non-additive costs does not have a solution based on graphical criteria only. Indeed, consider the graph in Figure 3 and define the following cost structure: c ( { B , R } ) = 1 , c ( { T , Q } ) = 1 , and the cost of any other subset of { B , Q , T , R } is equal to 2. Then { B , R } and { T , Q } are the only minimum cost adjustment sets. As we stated earlier, it is not possible to compare the asymptotic variances of these adjustment sets based solely on graphical criteria.

7 Conclusion

A reviewer suggested that, even though the problem of finding the optimal efficient adjustment set among those that satisfy a given budget constraint does not admit in general a graphical solution, an algorithm that discards provably inefficient adjustment sets among all that satisfy the budget constraint could still be useful to practitioners. We agree. A naive implementation of the suggested algorithm could go as follows. First, we use the algorithm developed in ref. [14] to list all adjustment sets in the graph. Second, we remove from this initial list all adjustment sets that have a cost greater than the budget constraint. Finally, we loop through all pairs of adjustment sets remaining in the list, and check whether the d-separation conditions in Proposition 3 from [10] hold. If they hold, then one of the adjustment sets in the pair will be inefficient and can be removed. Unfortunately, this implementation will not be computationally efficient for several reasons, including that the set of all adjustment sets can be exponentially large [14]. Whether a computationally efficient solution to this problem is possible is an interesting open problem.

Acknowledgements

The authors thank the reviewers for several useful comments and suggestions that led to improvements in the article. Andrea Rotnitzky was partially funded by the grants from the National Institutes of Health R01LM013614 and R01AI27271.

Conflict of interest: Prof. Andrea Rotnitzky is a member of the Editorial Board in the Journal of Causal Inference but was not involved in the review process of this article.

Appendix

This section contains the proofs of all the results in the main article, as well as preliminary technical lemmas.

We will need the following lemmas in the proofs of Propositions 1 and 2.

Lemma 3

Let Z be a minimal separator in ℋ 1 . Then

( d ( Z ) , d ( Z ) ¯ ) = { ( W ′ , W ″ ) : W ∈ Z } .

Proof

We first show that ( d ( Z ) , d ( Z ) ¯ ) ⊃ { ( W ′ , W ″ ) : W ∈ Z } . Since Z is a minimal separator, for any vertex W ∈ Z there is a path connecting W and Y in ℋ 1 that does not intersect other vertices in Z , and such a path corresponds to a directed path in D from Y ″ to W ′ that does not intersect any other vertices U ′ or U ″ for U ∈ Z . Hence, if W ∈ Z , then W ′ ∈ d ( Z ) and W ″ ∉ d ( Z ) . Thus, ( d ( Z ) , d ( Z ) ¯ ) ⊃ { ( W ′ , W ″ ) : W ∈ Z } .

Next we prove that ( d ( Z ) , d ( Z ) ¯ ) ⊂ { ( W ′ , W ″ ) : W ∈ Z } . Take ( B , Q ) ∈ ( d ( Z ) , d ( Z ) ¯ ) , we will show that ( B , Q ) ∈ { ( W ′ , W ″ ) : W ∈ Z } . We have the following four cases to analyze.

Assume that B = W ′ for some W ∈ Z . Due to how D was constructed, necessarily Q = W ″ . Thus, ( B , Q ) ∈ { ( W ′ , W ″ ) : W ∈ Z } .
Assume that B = W ′ for some W ∉ Z . Due to how D was constructed, necessarily Q = W ″ . Now, since W ′ ∈ d ( Z ) , there exists a directed path δ in D from Y ″ to U ′ for some U ∈ Z , such that δ does not intersect X ′ or X ″ for X ∈ Z ⧹ { U } and such that W ′ lies on δ . But δ has to go through W ″ to reach U ′ and this implies that W ″ ∈ d ( Z ) , which contradicts the assumption that W ″ = Q ∈ d ( Z ) ¯ .
Next consider the case that B = W ″ for some W ∈ Z . This cannot happen, since as we argued before, if W ∈ Z then W ′ ∈ d ( Z ) and W ″ ∉ d ( Z ) .
Finally, consider the case B = W ″ for some W ∉ Z . Then, due to how D was constructed, Q = U ′ for some U . Since Q = U ′ ∉ d ( Z ) then U ∉ Z . Now, since W ″ ∈ d ( Z ) , there exists a directed path δ in D from Y ″ to W ″ , and hence to U ′ , that does not intersect X ′ or X ″ for X ∈ Z . In particular, this implies that there is a path η connecting U and Y in ℋ 1 that does not intersect any vertices in Z . Let ν be a path connecting U and A in ℋ 1 . Since Z is a separator in ℋ 1 , ν has to intersect Z . Let R be the vertex in Z that lies closest to U in ν . The sub-path of ν that goes from U to R corresponds to a directed path κ from U ′ to R ′ in D . Joining δ and κ we obtain a directed path from Y ″ to R ′ that does not intersect X ′ or X ″ for X ∈ Z ⧹ { R } . Since U ′ lies on that path, we obtain that Q = U ′ ∈ d ( Z ) , which is a contradiction.

We have thus shown that ( B , Q ) ∈ { ( W ′ , W ″ ) : W ∈ Z } , finishing the proof of the lemma.□

Lemma 4

There exists a cut in D with finite capacity.

Proof

Let

S = { Y ′ , Y ″ } ∪ { W ′ : W ≠ A } .

This is a set of vertices of D that contains Y ″ and does not contain A ′ and hence it is a cut. Its capacity is given by the sum of the capacities of all internal edges, except Y ′ → Y ″ and A ′ → A ″ . Since these edges all have finite capacity, the capacity of S is finite, which is what we wanted to show.□

We are now ready to prove Proposition 1.

Proof of Proposition 1

We begin with the proof of the first assertion. d ( Z ) contains Y ″ by definition. We will show it does not contain A ′ , which will prove that d ( Z ) is a cut. Suppose for the sake of contradiction that A ′ ∈ d ( Z ) . Then there exists a directed path in D from Y ″ to A ′ that does not intersect any vertices W ′ or W ″ for W ∈ Z . This implies that there exists a path in ℋ 1 connecting Y to A that does not intersect Z , which contradicts the assumption that Z is a separator in ℋ 1 . Thus, d ( Z ) is a cut. The fact that k ( d ( Z ) ) = c ( Z ) follows from Lemma 3.

Next, we prove the second part of the proposition. We will first prove that h ( S ) is a separator. If Y and A are not connected in ℋ 1 , then h ( S ) is trivially a separator. Suppose then that there exists a path δ that connects Y and A in ℋ 1 . Such a path corresponds to a directed path from Y ″ to A ′ in D . Since S has finite capacity, any such path must contain an edge W ′ → W ″ for some W ′ ∈ S . This implies that δ intersects W ∈ h ( S ) , which is what we wanted to show. The claim that c ( h ( S ) ) = k ( S ) follows immediately from the definition of h ( S ) .

Next, we prove part three of the proposition. Let Z be a minimum cost separator. Then it is a minimal separator, and thus by part one d ( Z ) is a cut with k { d ( Z ) } = c ( Z ) . Suppose, for the sake of contradiction, that d ( Z ) is not a min-cut and hence that there exists a cut S in D such that k ( S ) < k { d ( Z ) } . Since S has finite capacity, part two of the proposition implies that h ( S ) is a separator with c { h ( S ) } = k ( S ) . But then c { h ( S ) } = k ( S ) < k { d ( Z ) } = c ( Z ) , contradicting the assumption that Z was a minimum cost separator. Thus, it must be that d ( Z ) is a min-cut.

Turn now to the proof of part four of the proposition. Let S be a min-cut. By Lemma 4, S has finite capacity. Then part two of the proposition implies that h ( S ) is a separator with c { h ( S ) } = k ( S ) . Suppose, for the sake of contradiction, that h ( S ) is not a minimum cost separator, and hence that there exists a minimum cost separator Z in ℋ 1 that satisfies c ( Z ) < c { h ( S ) } . Since Z is a minimal separator, part one of the proposition implies that d ( Z ) is a cut with k { d ( Z ) } = c ( Z ) . But then k { d ( Z ) } = c ( Z ) < c { h ( S ) } = k ( S ) , contradicting the assumption that S was a min-cut. Thus, it must be that h ( S ) is a minimum cost separator.

Finally, we prove the fifth part of the proposition. We begin by showing that Z ⊂ h { d ( Z ) } . Take W ∈ Z . We showed in Lemma 3 that W ′ ∈ d ( Z ) and W ″ ∉ d ( Z ) . Thus, W ∈ h { d ( Z ) } . Now we will show that Z ⊃ h { d ( Z ) } . Take W ∈ h { d ( Z ) } . Then W ′ ∈ d ( Z ) and W ″ ∉ d ( Z ) . Assume, for the sake of contradiction, that W ∉ Z . Since W ′ ∈ d ( Z ) there exists in D a directed path δ from Y ″ to U ′ for some U ∈ Z , such that δ does not intersect any vertices X ′ or X ″ for X ∈ Z ⧹ { U } and such that W ′ lies on δ . But since δ reaches U ′ , it has to go through W ″ too, implying that W ″ ∈ d ( Z ) , which is a contradiction. Thus, it must be that W ∈ Z .□

Proof of Proposition 2

Take W ∈ h ( S ′ ) ⧹ h ( S ) and a path δ in ℋ 1 connecting W to Y . We need to show that δ intersects h ( S ) . Now, in D there is a path corresponding to δ of the form

Y ″ → U 1 ′ → U 1 ″ → ⋯ → U l ′ → U l ″ → W ′ → W ″ .

Since S is a cut, Y ″ ∈ S . Since S is a min-cut, by Lemma 4 it has a finite capacity, and thus it must be that U 1 ′ ∈ S , because the edge Y ″ → U 1 ′ has infinite capacity. If U 1 ″ ∉ S , then U 1 ∈ h ( S ) and we are done. If U 1 ″ ∈ S , since S has finite capacity and the edge U 1 ″ → U 2 ′ has infinite capacity it must be that U 2 ′ ∈ S . We now repeat the same argument as before. If at some point we find that U j ′ ∈ S and U j ″ ∉ S , then U j ∈ h ( S ) and we are done. Otherwise, all of U 1 ′ , U 1 ″ , … , U l ′ , U l ″ are in S . We will show that this cannot happen. Assume it does. Since the edge U l ″ → W ′ has infinite capacity, it must be that W ′ ∈ S . If W ″ ∈ S , since S ⊂ S ′ we conclude that W ′ and W ″ are both in S ′ , which contradicts the assumption that W ∈ h ( S ′ ) . Hence, it must be that W ″ ∉ S , but this implies that W ∈ h ( S ) , which is a contradiction.□

The following lemma is a straightforward consequence of well-known results in the theory of flow networks. We include it here for completeness sake, since we will need it in the proof of Proposition 3.

Lemma 5

Let S be a cut. Then S is a min-cut if and only if it holds that for all e ∈ ( S , S ¯ ) , f ∗ ( e ) = k ( e ) , and for all e ∈ ( S ¯ , S ) , f ∗ ( e ) = 0 .

Proof

We begin by noting the following. By Lemma 5.1 of ref. [15], the total flow of f ∗ satisfies

(6) F ∗ = ∑ e ∈ ( S , S ¯ ) f ∗ ( e ) − ∑ e ∈ ( S ¯ , S ) f ∗ ( e ) ,

and for all edges e it holds that

(7) 0 ≤ f ∗ ( e ) ≤ k ( e ) .

Assume first that S is a min-cut. By the max-flow min-cut theorem (see Theorem 5.1 of ref. [15]), F ∗ satisfies

(8) F ∗ = ∑ e ∈ ( S , S ¯ ) k ( e ) .

It follows from (6), (7), and (8) that if e ∈ ( S , S ¯ ) , f ∗ ( e ) = k ( e ) and if e ∈ ( S ¯ , S ) , f ∗ ( e ) = 0 , which is what we wanted to show.

Now assume that if e ∈ ( S , S ¯ ) , f ∗ ( e ) = k ( e ) and if e ∈ ( S ¯ , S ) , f ∗ ( e ) = 0 . Then, by (6), the total flow of f ∗ satisfies (8). The max-flow min-cut theorem then implies that S is a min-cut, which is what we wanted to show.□

Proof of Proposition 3

We first show that S c is a cut. We need to show that Y ″ ∈ S c and A ′ ∉ S c . That Y ″ ∈ S c follows from the definition of S c . On the other hand, since f ∗ is a max-flow there can be no paths from Y ″ to A ′ that are augmenting for f ∗ , since if there were, the total flow of f ∗ could be increased. Thus, A ′ ∉ S c .

Next, we show that S c is a min-cut. By Lemma 1, it suffices to show that if e ∈ ( S c , S ¯ c ) then f ∗ ( e ) = k ( e ) and if e ∈ ( S ¯ c , S c ) then f ∗ ( e ) = 0 . Take W ∈ S c and U ∈ S ¯ c . Since W ∈ S c , there exists a path δ from Y ″ to W that is augmenting for f ∗ . Suppose e = ( W , U ) is an edge in D . Then f ∗ ( e ) = k ( e ) , because if f ∗ ( e ) < k ( e ) the path obtained by joining δ and e would be a path from Y ″ to U that is augmenting for f ∗ , implying that U ∈ S c , which is a contradiction. Suppose that e = ( U , W ) is an edge in D . Then f ∗ ( e ) = 0 , because if f ∗ ( e ) > 0 the path obtained by joining δ and e would be a path from Y ″ to U that is augmenting for f ∗ , implying that U ∈ S c , which is a contradiction. We have thus shown that S c is a min-cut.

Now take any other min-cut S . We will show that S c ⊂ S . Take U ∈ S c . We need to show that U ∈ S . Since U ∈ S c , there exists a path δ from Y ″ to U in D that is augmenting for f ∗ . Suppose the vertices in δ are Y ″ , W 1 , W 2 … , W l , W l + 1 = U . Since S is a cut, we have that Y ″ ∈ S . We will show that W i ∈ S for all i = 1 , … , l + 1 by induction. Let e 1 be the edge joining Y ″ and W 1 in δ . Since δ is augmenting for f ∗ , we have that if e 1 = ( Y ″ , W 1 ) then f ∗ ( e 1 ) < k ( e 1 ) whereas if e 1 = ( W 1 , Y ″ ) then f ∗ ( e 1 ) > 0 . Since S is a min-cut, Lemma 5 implies that W 1 ∈ S . Now, suppose that for some 1 ≤ i < l + 1 it holds that W i ∈ S . Let e i + 1 be the edge joining W i and W i + 1 in δ . Since δ is augmenting for f ∗ , we have that if e i + 1 = ( W i , W i + 1 ) then f ∗ ( e i + 1 ) < k ( e i + 1 ) whereas if e i + 1 = ( W i + 1 , W i ) then f ∗ ( e i + 1 ) > 0 . Since S is a min-cut, Lemma 5 implies that W i + 1 ∈ S . This finishes the proof of the proposition.□

Proof of Theorem 1

By Proposition 3, S c is a min-cut. Thus, by part four of Proposition 1, O c = h ( S c ) is a minimum cost separator in ℋ 1 . Lemma 1 implies that O c is a minimum cost L – N adjustment set.

Now, let Z be any other minimum cost L – N adjustment set. We will show that O c ≼ L Z . Lemma 1 implies that Z is a minimum cost separator in ℋ 1 . By part three of Proposition 1, d ( Z ) is a min-cut in D . Thus, Proposition 3 implies that S c ⊂ d ( Z ) . Proposition 2 implies that h ( S c ) ⊴ ℋ 1 h { d ( Z ) } . But part five of Proposition 1 establishes that h { d ( Z ) } = Z . We have shown that O c ⊴ ℋ 1 Z , which by Lemma 2 implies that O c ≼ L Z . This finishes the proof of the theorem.□

References

[1] Robins J. A new approach to causal inference in mortality studies with a sustained exposure period-application to control of the healthy worker survivor effect. Math Modell. 1986;7(9–12):1393–512. 10.1016/0270-0255(86)90088-6Search in Google Scholar

[2] Pearl J. Causality: models, reasoning and inference. Cambridge, UK: Springer; 2000. Search in Google Scholar

[3] Kuroki M, Miyakawa M. Covariate selection for estimating the causal effect of control plans by using causal diagrams. J R Statist Soc B (Statist Methodol). 2003;65(1):209–22. 10.1111/1467-9868.00381Search in Google Scholar

[4] Shpitser I, VanderWeele T, Robins JM. On the validity of covariate adjustment for estimating causal effects. In: UAI’10. Arlington, USA: AUAI Press; 2010. p. 527–36. Search in Google Scholar

[5] Perković E, Textor J, Kalisch M, Maathuis MH. Complete graphical characterization and construction of adjustment sets in Markov equivalence classes of ancestral graphs. J Machine Learn Res. 2018;18(220):1–62. http://jmlr.org/papers/v18/16-319.html. Search in Google Scholar

[6] Kuroki M, Cai Z. Selection of identifiability criteria for total effects by using path diagrams. In: UAI’04. Arlington, USA: AUAI Press; 2004. p. 333–40. Search in Google Scholar

[7] Henckel L, Perković E, Maathuis MH. Graphical criteria for efficient total effect estimation via adjustment in causal linear models. 2019. arXiv: http://arXiv.org/abs/arXiv:190702435. Search in Google Scholar

[8] Witte J, Henckel L, Maathuis MH, Didelez V. On efficient adjustment in causal graphs. J Machine Learn Res. 2020;21:246. Search in Google Scholar

[9] Rotnitzky A, Smucler E. Efficient adjustment sets for population average causal treatment effect estimation in graphical models. J Mach Learn Res. 2020;21:1–86. Search in Google Scholar

[10] Smucler E, Sapienza F, Rotnitzky A. Efficient adjustment sets in causal graphical models with hidden variables. Biometrika. 2021 Mar;109:49–65. 10.1093/biomet/asab018. Search in Google Scholar

[11] Runge J. Necessary and sufficient graphical conditions for optimal adjustment sets in causal graphical models with hidden variables. 2021. arXiv: http://arXiv.org/abs/arXiv:210210324. Search in Google Scholar

[12] Malek A, Chiappa S. Asymptotically best causal effect identification with multi-armed bandits. Adv Neural Inform Process Sys. 2021;34:21960–71. Search in Google Scholar

[13] Acid S, De Campos LM. An algorithm for finding minimum d-separating sets in belief networks. In: UAI’96. Arlington, USA: AUAI Press; 1996. p. 3–10. Search in Google Scholar

[14] van der Zander B, Liśkiewicz M, Textor J. Separators and adjustment sets in causal graphs: Complete criteria and an algorithmic framework. Artif Intell. 2019;270:1–40. 10.1016/j.artint.2018.12.006Search in Google Scholar

[15] Even S. Graph algorithms. Cambridge, UK: Cambridge University Press; 2011. 10.1017/CBO9781139015165Search in Google Scholar

[16] Van der Vaart AW. Asymptotic statistics. vol. 3. Cambridge, UK: Cambridge University Press; 2000. Search in Google Scholar

[17] Spirtes P, Glymour CN, Scheines R, Heckerman D, Meek C, Cooper G, et al. Causation, prediction, and search. Cambridge, USA: MIT Press; 2000. 10.7551/mitpress/1754.001.0001Search in Google Scholar

[18] Robins JM, Richardson TS. Alternative graphical causal models and the identification of direct effects. Causality and psychopathology: finding the determinants of disorders and their cures. Oxford, U.K: Oxford University Press; 2010. p. 103–58. 10.1093/oso/9780199754649.003.0011Search in Google Scholar

[19] Maathuis MH, Colombo D. A generalized back-door criterion. Annals Statist. 2015 June;43(3):1060–88. 10.1214/14-AOS1295. Search in Google Scholar

[20] Textor J, Liskiewicz M. Adjustment criteria in causal diagrams: an algorithmic perspective. In: UAI’11. Arlington, USA: AUAI Press; 2011. p. 681–8. Search in Google Scholar

[21] van der Zander B, Liskiewicz M, Textor J. Constructing separators and adjustment sets in ancestral graphs. In: UAI’14. Arlington, USA: AUAI Press; 2014. p. 11–24. Search in Google Scholar

[22] Robins JM, Rotnitzky A. Recovery of information and adjustment for dependent censoring using surrogate markers. In: AIDS Epidemiology. New York, USA: Springer; 1992. p. 297–331. 10.1007/978-1-4757-1229-2_14Search in Google Scholar

[23] Hirano K, Imbens GW, Ridder G. Efficient estimation of average treatment effects using the estimated propensity score. Econometrica. 2003;71(4):1161–89. 10.3386/t0251Search in Google Scholar

[24] Hahn J. On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica. 1998;66(2):;315–31. 10.2307/2998560Search in Google Scholar

[25] Van der Laan M, Robins JM. Unified methods for censored longitudinal data and causality. New York, USA: Springer Science & Business Media; 2003. 10.1007/978-0-387-21700-0Search in Google Scholar

[26] Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, Newey W, et al. Double/debiased machine learning for treatment and structural parameters. Econometrics J. 2018;21(1):C1–C68. https://onlinelibrary.wiley.com/doi/abs/10.1111/ectj.12097. 10.3386/w23564Search in Google Scholar

[27] Smucler E, Rotnitzky A, Robins JM. A unifying approach for doubly-robust l1 regularized estimation of causal contrasts. 2019. arXiv: http://arXiv.org/abs/arXiv:190403737. Search in Google Scholar

[28] Cheriyan J, Maheshwari S. Analysis of preflow push algorithms for maximum network flow. SIAM J Comput. 1989;18(6):1057–86. 10.1007/3-540-50517-2_69Search in Google Scholar

[29] Hagberg AA, Schult DA, Swart PJ. Exploring network structure, dynamics, and function using network. In: Varoquaux G, Vaught T, Millman J, editors. Proceedings of the 7th Python in Science Conference. Pasadena, CA USA; 2008. p. 11–15. Search in Google Scholar

Received: 2022-02-25

Revised: 2022-06-14

Accepted: 2022-06-23

Published Online: 2022-07-14

This work is licensed under the Creative Commons Attribution 4.0 International License.

A note on efficient minimum cost adjustment sets in causal graphical models

Abstract

1 Introduction

2 Background

2.1 Undirected graphs

2.2 Directed graphs

2.3 Flow networks

2.4 Semiparametric estimation

3 Causal graphical models

4 Minimum cost adjustment sets and their graphical characterization

Definition 1

Definition 2

4.1 Minimum cost adjustment sets and undirected graphs

Definition 3

Lemma 1

5 Non-parametric estimation of the g-functional

5.1 Efficiency comparison of minimum cost adjustment sets

Lemma 2

6 Optimal minimum cost adjustment sets and network flows

Definition 4

Definition 5

Proposition 1

Proposition 2

Definition 6

Proposition 3

Theorem 1

6.1 Examples

7 Conclusion

Acknowledgements

Appendix

Lemma 3

Proof

Lemma 4

Proof

Proof of Proposition 1

Proof of Proposition 2

Lemma 5

Proof

Proof of Proposition 3

Proof of Theorem 1

References

Journal and Issue

Articles in the same Issue