A generalized double robust Bayesian model averaging approach to causal effect estimation with application to the study of osteoporotic fractures

Denis Talbot; Claudia Beaudoin

doi:10.1515/jci-2021-0023

Open Access Published by De Gruyter November 17, 2022

A generalized double robust Bayesian model averaging approach to causal effect estimation with application to the study of osteoporotic fractures

Denis Talbot and Claudia Beaudoin

From the journal Journal of Causal Inference

https://doi.org/10.1515/jci-2021-0023

Abstract

Analysts often use data-driven approaches to supplement their knowledge when selecting covariates for effect estimation. Multiple variable selection procedures for causal effect estimation have been devised in recent years, but additional developments are still required to adequately address the needs of analysts. We propose a generalized Bayesian causal effect estimation (GBCEE) algorithm to perform variable selection and produce double robust (DR) estimates of causal effects for binary or continuous exposures and outcomes. GBCEE employs a prior distribution that targets the selection of true confounders and predictors of the outcome for the unbiased estimation of causal effects with reduced standard errors. The Bayesian machinery allows GBCEE to directly produce inferences for its estimate. In simulations, GBCEE was observed to perform similarly or to outperform DR alternatives. Its ability to directly produce inferences is also an important advantage from a computational perspective. The method is finally illustrated for the estimation of the effect of meeting physical activity recommendations on the risk of hip or upper-leg fractures among older women in the study of osteoporotic fractures. The 95% confidence interval produced by GBCEE is 61% narrower than that of a DR estimator adjusting for all potential confounders in this illustration.

Keywords: causal inference; confounding; double robustness; model averaging; model selection

MSC 2010: 62F15; 62G35

1 Introduction

Estimating causal effects using observational data requires important subject-matter knowledge. One notably needs to identify the confounding covariates that bias the association between the exposure and the outcome. Unfortunately, prior knowledge is often insufficient to undertake this task. For example, reviews of variable selection methods used in epidemiological journals indicate that data-driven approaches are frequently employed to select confounders [1,2].

Recent research has shown that many classical model selection approaches, including stepwise regression, Bayesian model averaging, lasso, and adaptive lasso, can have poor performances in a causal inference framework (e.g., [3,4, 5,6]). There are two important explanations of this phenomenon. First, many model selection approaches do not account for the model selection steps and therefore produce confidence intervals that include the true effect less often than they should, an issue known as the post-selection inference problem [7,8, 9,10]. Various approaches, including Bayesian methods and bootstraping, have been proposed to deal with this post-selection problem. However, note that not all Bayesian or bootstrap methods solve the post-selection problem, see, for example, [11]. Second, classical methods often fail to identify all important confounders [5,6,12,13]. This is because they generally focus on modeling the outcome as a function of the exposure and potential confounders. Confounders that are weakly associated with the outcome but strongly with the exposure may thus fail to be identified.

Multiple model selection methods for causal inference have been introduced in recent years, including the collaborative targeted maximum likelihood (C-TMLE [14]), Bayesian adjustment for confounding (BAC [6,15]), confounder selection via penalized credible regions (BP [16]), inference on treatment effects after selection among high-dimensional controls (HDM [13]), Bayesian causal effect estimation algorithm (BCEE [5]), model averaged double robust estimation (MADR [17]), outcome-adaptive lasso (OAL [4]), the group lasso and double robust estimation of causal effects (GLiDeR [18]), the outcome highly-adaptive lasso (OHAL [19]), and the high-dimensional confounding adjustment using continuous spike and slab priors (HD-SSL [20]). Despite these important developments, there is still a need for new methods or for the extension of existing methods. For instance, not all methods directly produce standard errors or confidence intervals for their causal effect estimator. Instead, they rely on bootstrap procedures for making inferences, which can be computationally intensive. Moreover, many methods were specifically elaborated for estimating the effect of a binary exposure on a continuous outcome. Only a few address additional situations. Finally, most methods model either the outcome as a function of the exposure and confounders, or the exposure as a function of confounders, and require that the postulated model is correct. In contrast, double robust (DR) methods combine both models and yield consistent estimators if either model, but not necessarily both, is correct. As such, DR methods require less stringent modeling assumptions than alternatives. Table 1 presents a summary of the characteristics of the causal model selection methods mentioned earlier.

Table 1

Characteristics of some causal model selection methods

Method	R package	Binary exposure	Continuous exposure	Binary outcome	Continuous outcome	Modeling	Inferences
C-TMLE	`ctmle`	✓		✓	✓	Double robust	Asymptotic
BAC	`bacr`, `BACprior`	✓	✓	✓	✓	Outcome	Bayesian
BP	BayesPen 1	✓	✓	✓	✓	Outcome	Bayesian
HDM	`hdm`	✓			✓	Outcome	Asymptotic
BCEE	`BCEE`	✓	✓		✓	Outcome	Bayesian
MADR	`madr`	✓		± 2	✓	Double robust	Bootstrap
OAL	None 3	✓		✓	✓	Exposure	Bootstrap
GLiDeR	None 4	✓		± 2	✓	Double robust	Bootstrap
OHAL	None 5	✓		✓	✓	Double robust	Cross-validation
HD-SSL	`HDconfounding` 6	✓	✓	✓	✓	Outcome	Bayesian

1 The BayesPen package was removed from the CRAN repository in December 2014 because the maintainer email address bounced.

2 The proposed method would directly adapt to this situation, but the software does not accommodate it.

3 R code is provided in ref. [4], but the code depends on the lqa R package that has been removed from the CRAN repository due to uncorrected error checks and is not compatible with the latest version of R.

4 R code is provided in ref. [18].

5 R code is provided in ref. [19].

6 Available on https://github.com/jantonelli111/HDconfounding.

In this article, we extend the BCEE algorithm in several directions. First, we generalize the algorithm to accommodate any combination of binary or continuous exposure and outcome. We also propose using a DR estimator of the causal effect within the algorithm to improve the robustness of the approach to modeling assumptions. Finally, we propose a revised implementation that substantially reduces computational time for running the algorithm. We call this extension the generalized Bayesian causal effect estimation algorithm (GBCEE). BCEE’s framework has various desirable features. First, its model selection algorithm is theoretically motivated using the graphical framework to causal inference (e.g., [21]). This algorithm favors models that include true confounders in addition to outcome risk factors, but that excludes instruments (covariates associated with the exposure, but not the outcome). Both simulation and theoretical results indicate that including outcome risk factors, in addition to true confounders, allows the unbiased estimation of causal effects with improved efficiency (reduced variance of the estimators) compared to models including all potential confounders or to models including only true confounders [4,18,22,23,24, 25,26]. Of particular interest to this work are the theoretical results of Koch et al. (2018) concerning DR estimators [18]. Moreover, BCEE takes advantage of the Bayesian model averaging framework to directly produce inferences that account for the model selection step, thus avoiding reliance on the bootstrap.

While Bayesian DR estimation of causal effects has previously been proposed, for example [27,28], GBCEE differs from previous proposals because it is specifically designed for confounder selection. The current work also shares similarities with BAC [6,15], which performs confounder selection and produces inferences that account for the model selection using a Bayesian framework. In fact, GBCEE’s algorithm is inspired by that of BAC. However, BAC targets the selection of all variables associated with either the exposure or the outcome, thus selecting instruments in addition to confounders and outcome risk factors, which may result in a loss of efficiency [24]. Moreover, GBCEE is a DR procedure, unlike BAC. Finally, GBCEE can also be seen as a special case of the general framework for MADR introduced by Cefalu et al. [17], where BCEE’s weights are used within this framework. However, there are important differentiating features between GBCEE and the specific MADR estimator they developed, including the use of the Bayesian framework to produce inferences, a different method for attributing weights to models in the model averaging procedure and a different choice of DR estimator. As will be seen in Section 4, these differences result in GBCEE being much more computationally efficient than MADR, in addition to having a noticeably lower standard error in some simulation scenarios.

The remainder of this article is structured as follows. In Section 2, we introduce some notation and further motivate the confounder selection problem. The GBCEE algorithm is presented in Section 3. Section 4 presents a simulation study to investigate and compare the finite sample properties of GBCEE. An illustration of GBCEE’s application for estimating the effect of physical activity on the risk of fractures using data from the Study of Osteoporotic Fractures is presented in Section 5. We conclude in Section 6 with a discussion of the results and perspectives for future research. All extensions presented in this article are implemented in the R package BCEE.

2 Notation and motivation

Let Y be the outcome of interest and X be the exposure under study. Let Y x be the counterfactual outcome that would have been observed if, possibly contrary to fact, X = x . We denote by Δ the causal exposure effect of interest. We restrict our attention to causal effects that can be expressed as functions of the counterfactual expectations of the outcome. For example, if the exposure is binary, Δ can be the causal risk difference E [ Y 1 ] − E [ Y 0 ] .

For estimating the causal effect, a sample of independent and identically distributed observations O = { U i , X i , Y i } , i = 1 , … , n , is drawn from a given population, where U i = { U i , 1 , … , U i , M } is a set of potentially confounding covariates. We assume conditional exchangeability Y x ∐ X ∣ U and positivity f X ( x ∣ U = u ) > 0 for all u and all x , where f X ( x ) > 0 . The conditional exchangeability assumption is often interpreted as a “no unmeasured confounders” assumption and informally entails there is no factor that simultaneously affects X and Y after conditioning on U . The positivity assumption means that all possible values of X have a positive probability of occurring within each stratum of u . Under these assumptions, counterfactual expectations E [ Y x ] are non parametrically identified from the observed data as E U { E Y [ Y ∣ X = x , U ] } . Confounder selection consists in finding a subset of U , Z ⊆ U , such that the conditional exchangeability assumption also holds, Y x ∐ X ∣ Z . To achieve this objective, GBCEE relies on an additional assumption that all direct determinants of X are measured. As will be seen in the next section, direct determinants of X play a pivotal role in GBCEE.

Confounder selection among the set U can be desirable for multiple reasons. When M ≈ n or M > n , it might not be feasible to estimate E [ Y ∣ X = x , U ] because of data sparsity. Moreover, U is susceptible to include instruments or conditional instruments. Adjusting for such variables is known to increase the variance of estimators and is susceptible to amplify biases (e.g., see [29] and references therein). Finally, identifying which covariates are true confounders may be of substantive interest.

3 The GBCEE algorithm

We now describe the GBCEE algorithm. We first provide an overview of the algorithm in the next subsection. The construction of the prior distribution is presented in Section 3.2. This prior distribution is the same as in BCEE [5], but a brief presentation is repeated here for completeness. The DR estimators used within the algorithm are presented in Section 3.3. The implementation of the algorithm is described in Section 3.4.

3.1 Overview

GBCEE averages DR estimates of the causal effect Δ over potential confounder sets. These sets are the 2 M possible subsets of U . Let α Y = ( α 1 Y , … , α M Y ) be an M -dimensional vector for the inclusion of the covariates U in the potential confounder set, where component α m Y equals 1 if U m is included in the set and α m Y equals 0 otherwise, and A be the set of all possible confounder sets. Using this notation, GBCEE estimates Δ as a model-averaged posterior mean

(1) Δ ˆ ≡ E [ Δ ∣ O ] = ∑ α Y ∈ A Δ ˆ α Y P ( α Y ∣ O ) ,

where Δ ˆ α Y ≡ E [ Δ ∣ O , α Y ] is the posterior mean of Δ when adjusting for the variables α Y [30]. As will be seen in Section 3.3, each Δ ˆ α Y is constructed using the output of an exposure model and an outcome model, both of which include the potential confounders indicated by α Y . In this regard, GBCEE differs from BCEE, which estimates the average treatment effect by performing a model average of the exposure coefficient across linear regression outcome models. To extend the approach to accommodate binary outcomes, it may be tempting to use a similar approach and average the exposure coefficient across logistic regression outcome models. However, this approach is inappropriate because of the noncollapsibility of the odds ratio [5]. Averaging an estimator of the causal effect across models solves this issue. In addition, as mentioned in the introduction, employing DR estimators improves the robustness of GBCEE to model misspecifications as compared to BCEE, which relies on the presence of a correctly specified outcome model among the candidate models to yield consistent estimates.

The weight attributed to each estimate Δ ˆ α Y is the posterior probability of a model for the outcome, conditional on the exposure and the covariates included in α Y , P ( α Y ∣ O ) ∝ P ( Y ∣ α Y , X , U ) P ( α Y ) . We propose using a generalized linear model as the model for the outcome:

(2) g [ E ( Y ∣ α Y , X , U ) ] = δ 0 α Y + β α Y X + ∑ m = 1 M α m Y δ m α Y U m ,

where g is a known link function. This outcome model serves as a component in constructing Δ ˆ α Y , in addition to being used for computing the marginal likelihood P ( Y ∣ α Y , X , U ) .

So far, GBCEE is thus similar to a classical Bayesian model averaging of Δ ˆ α Y . The important differentiating feature of GBCEE is that it uses an informative prior distribution, P ( α Y ) , tailored to put the bulk of the posterior weight on potential confounder sets that meet the conditional exchangeability assumption. The prior distribution is further constructed to favor sets that exclude instruments or conditional instruments. As such, the prior distribution targets sets α Y that allow unbiased estimation of Δ , with improved efficiency as compared to the full set U . We note that equation (1) is also a particular case of equation (1) in the study by Cefalu et al. [17]: Δ ˆ = ∑ j : ℳ j ∈ ℳ w j Δ ˆ j , where ℳ is a set of models and w j is the weight of model ℳ j . Similar to GBCEE, the specific MADR estimator developed in Cefalu et al. [17] uses a posterior probability as a weight. However, they allow different variables to be selected in the outcome and the exposure models when constructing DR estimators Δ ˆ j . In GBCEE, the same variables are included in both models. Our choice is motivated by the usual exchangeability assumption Y x ∐ X ∣ Z that ensures consistent DR estimation of the causal effect when both the outcome and exposure models include Z . Another difference is that each DR estimate Δ ˆ j in MADR is weighted according to the joint posterior probability of the outcome and exposure models that are used to construct it, whereas we propose weighting estimates according to the posterior probability of the outcome model only. However, as will be seen in the next section, GBCEE also uses information from the exposure model in constructing its informative prior distribution P ( α Y ) .

Before proceeding with the description of the prior distribution, we note that our procedure is only approximately Bayesian, notably because prior distributions are not specified for regression coefficients. While the core principle of the procedure, Bayesian model averaging, is Bayesian, we recognize that several aspects and approximations are borrowed from a frequentist framework. However, most of the frequentist approximations we propose making for the practical implementation of the approach are commonly used for implementing Bayesian model averaging, for example, in the R package BMA [31].

3.2 Prior distribution construction

The first step for constructing the prior distribution P ( α Y ) aims to identify the determinants of the exposure. This step helps identifying instruments, conditional instruments, as well as confounders in later steps. To do so, we consider generalized linear models for the exposure according to potential confounders. Let α X ∈ A be defined similarly to α Y , that is, α X is an M -dimensional vector for the inclusion of the covariates U in the exposure model. The exposure models are

(3) g [ E ( X ∣ α X , U ) ] = δ 0 α X + ∑ m = 1 M α m X δ m α X U m ,

where g is a known link function. The posterior distribution of the exposure model P ( α X ∣ X , U ) ∝ P ( X ∣ α X , U ) P ( α X ) is then computed. Typically, a uniform prior distribution over A would be used for P ( α X ) , but other choices of prior are possible and are accommodated in the BCEE package. The set α X that includes all true (direct) determinants of the exposure is ensured to have a posterior probability of 1 asymptotically if the exposure model is correctly specified [32,33]. As described in Section 3.3, the exposure models (3) are also used for constructing the DR estimates Δ ˆ α Y .

Using the information from the first step, GBCEE’s prior distribution is defined as P ( α Y ) = ∑ α X ∈ A P ( α Y ∣ α X ) P ( α X ∣ X , U ) , where P ( α Y ∣ α X ) ∝ ∏ m = 1 M P α Y ∣ α X ( α m Y ∣ α m X ) , and

(4) P α Y ∣ α X ( α m Y = 1 ∣ α m X = 0 ) = P α Y ∣ α X ( α m Y = 0 ∣ α m X = 0 ) = 1 2 , P α Y ∣ α X ( α m Y = 1 ∣ α m X = 1 ) = ω m α Y ω m α Y + 1 , P α Y ∣ α X ( α m Y = 0 ∣ α m X = 1 ) = 1 ω m α Y + 1 .

Intuitively, if U m was not found to be associated with the exposure in the first step of GBCEE ( α m X = 0 ), there is no reason to force either the inclusion or the exclusion of U m from the outcome model; hence, P α Y ∣ α X ( α m Y = 1 ∣ α m X = 0 ) = P α Y ∣ α X ( α m Y = 0 ∣ α m X = 0 ) = 1 / 2 . However, if U m was found to be associated with the exposure ( α m X = 1 ), it is desirable (i) to force its inclusion in the outcome model if U m is also associated with the outcome ( U m is a confounder), and (ii) force its exclusion from the outcome model if U m is not associated with the outcome ( U m is an instrument or a conditional instrument). This is accomplished through the parameter ω m α Y , which is proportional to the strength of the association between U m and Y conditional on the other variables in the outcome model α Y . Some additional notation must be introduced to define ω m α Y formally.

We denote by δ ˜ m α Y the regression coefficient for U m in the outcome model defined by α Y if U m is included in this model (that is, if α m Y = 1). Otherwise, δ ˜ m α Y is the regression coefficient for U m in the outcome model that includes the same covariates as α Y , but additionally include U m . Thus, δ ˜ m α Y measures the conditional association between U m and Y .

(5) ω m α Y = ω × δ ˜ m α Y σ U m σ Y 2 ,

where 0 ≤ ω ≤ ∞ is a user-defined hyperparameter, σ U m and σ Y are the standard deviations of U m and Y , respectively. Note that the term σ U m / σ Y is used to standardize δ ˜ m α Y , making it insensitive to the choice of scale for Y and U m . However, when Y is binary (0/1), the regression coefficients do not need to be scaled for σ Y ; we thus set σ Y = 1 in equation (5) in that case. Since δ ˜ m α Y , σ U m , and σ Y are unknown, they are replaced by their maximum likelihood estimates in practice, which gives an empirical Bayes flavor to the prior distribution (although the procedure is not formally empirical Bayes). Our proposed implementation, detailed in Appendix B, accounts for the uncertainty associated with the estimation of the parameters of the prior distribution. Consequently, the estimation of the parameters of the prior distribution is expected to increase the variance of the causal effect estimator, but not to affect the coverage of credible intervals. This is supported by simulation results presented in Appendix B. The factor ω is used to determine how informative the prior is. It is recommended to choose ω = c × n b , with 0 < b < 1 and c is a constant that does not depend on the sample size. This ensures that the approximation of the prior distribution using maximum likelihood estimates behaves as desired asymptotically, that is, that confounders are included and instruments and conditional instruments are excluded [5]. An expanded demonstration of the asymptotic properties of the estimated prior distribution can be found in Appendix A. We have previously observed that BCEE’s results are not much affected by the choice of c in the range [100, 1,000] when setting ω = c n 0.5 [5].

Classical Bayesian model averaging results ensure that the marginal likelihood of the outcome model α Y that includes all variables associated with Y (confounders and outcome risk factors) dominates the marginal likelihood of the other outcome models [32,33]. Together with the asymptotic properties of the maximum likelihood approximation to the proposed prior distribution, this guarantees that the posterior mass P ( α Y ∣ O ) concentrates on an adjustment set that includes confounders and outcome risk factors, and that excludes instruments and superfluous variables associated with neither the exposure nor the outcome. It can be noted that using a noninformative prior distribution P ( α Y ) would also achieve this. The goal of our proposed prior distribution is thus to improve on the finite-sample properties of Bayesian model averaging with a noninformative prior distribution, without sacrificing its asymptotic properties.

Now that the prior distribution of GBCEE has been presented, we turn our attention to the estimation of the causal effect for a given adjustment set Δ ˆ α Y .

3.3 DR estimators

Targeted maximum likelihood estimation (TMLE) is a general framework for constructing DR estimators introduced by ref. [34], but based on earlier work [35]. TMLE is consistent if either the outcome or the exposure model is correctly specified and is semiparametric efficient when both are correctly specified. The augmented inverse probability weighting (AIPW) estimator that MADR and GLiDeR use share these asymptotic properties. However, it has been suggested that TMLE is more robust to near violations of the positivity assumption than AIPW [36,37], although this does not mean that TMLE is unaffected by positivity violations. For these reasons, we propose using TMLE in GBCEE to estimate Δ α Y .

The estimators we propose using depend on the type of the outcome and of the exposure. For constructing these estimators, we use the generalized linear model (2) for the outcome with an identity link when Y is continuous and with a logit link when Y is binary. For the exposure model, we use model (3) with an identity link when X is continuous and with a logit link when X is binary.

Briefly, TMLE first entails proposing an initial estimate for the causal contrast of interest based on the outcome model. The initial estimate is then “fluctuated” based on the output from the exposure model. The intuition is that the exposure model should contain no additional information for predicting the outcome if the initial outcome model is correctly specified. A residual association thus suggests that residual confounding is present. TMLE uses this residual association as an estimate of the residual bias of the initial estimator and modifies (fluctuates) the initial estimate in a clever way, that is, such that the estimating equations of the nonparametric efficient influence function of the causal contrast is solved. In general, these steps may need to be iterated to obtain a solution that solves the estimating equations of the efficient influence function. However, the TMLEs we use in GBCEE have been shown to converge in a single step. The influence function is a function of the observed data O that can be derived for a given statistical parameter and model. Any regular estimator β ˆ of a parameter β that is asymptotically linear can be written as n 1 / 2 ( β ˆ − β ) = n − 1 / 2 ∑ i = 1 n ϕ ( O i ) + o p ( 1 ) , where ϕ ( O i ) is the influence function of the estimator, which is a zero-mean function with finite variance, and o p ( 1 ) is a term that converges in probability to 0 as n grows to infinity [38]. Remark that the asymptotic behavior of a regular asymptotically linear estimator, such as its asymptotic variance, is fully characterized by its influence function. In addition, by the central limit theorem, these estimators have a normal limit distribution. The efficient influence function describes the consistent estimator of a given parameter that has the lowest variance. As such, the variance of the efficient influence function is a generalization of the Cramér-Rao variance lower bound. An introductory tutorial to TMLE is provided by ref. [39], and a more technical presentation can be found in ref. [37].

We now describe the estimands we consider as well as their TMLE estimators. Because the GBCEE framework is fairly general, it could accommodate other estimands or estimators. The description below thus focuses on what is currently implemented in the R package BCEE. Remark that our proposal entails using frequentist estimates Δ ˆ α Y as an approximation for the posterior mean E [ Δ ∣ O , α Y ] in equation (1). This approximation is sensible under a vague prior for Δ α Y for sample sizes that are large enough for the prior distribution to have a negligible contribution to the posterior mean and the estimator to have a symmetric distribution (recall that the TMLE has a normal limit distribution). As such, this approximation is not expected to have a noticeable impact on the results as long as the sample size is not too small. A similar approximation is also used for implementing Bayesian model averaging in the R package BMA[31]. We note that a Bayesian version of TMLE has been proposed [37], targeted Bayesian learning, and could be considered within GBCEE if it were desired to employ an informative prior for Δ α Y .

3.3.1 Continuous or binary outcome, binary exposure

The estimands we consider are Δ = E [ Y 1 ] − E [ Y 0 ] when the outcome is continuous or binary, and Δ = E [ Y 1 ] / E [ Y 0 ] when the outcome is binary. The estimator is the TMLE proposed in Chapters 4 and 7 of ref. [37]. We use the same algorithm as ref. [40] for estimating E [ Y x ] , x ∈ { 0 , 1 } . The algorithm first entails computing inverse probability of exposure weights w i and an initial estimate for the standardized outcome mean:

w i = I ( X i = x ) P ˆ ( X = x ∣ α Y , U i ) , Q i 0 x = E ˆ [ Y ∣ α Y , X = x , U i ] ,

where I ( ⋅ ) is the indicator function taking value 1 when its argument is true and 0 otherwise, and P ˆ ( X = 1 ∣ α Y , U i ) and E ˆ [ Y ∣ α Y , X = x , U i ] are calculated from (3) and (2), respectively. Note that both models include the covariates α Y . When the outcome is continuous, the final estimate is then obtained as follows:

ε ˆ x = argmin ε x ∑ i = 1 n w i [ y i − Q i 0 x − ε x ] 2 , Q i 1 x = Q i 0 x + ε ˆ x , E ˆ [ Y x ] = ∑ i = 1 n Q i 1 x n .

The quantity ε ˆ can be computed as the intercept of a linear regression of Y with only an intercept and offset term Q i 0 x weighted according to w i . When the outcome is binary, only the equations for calculating ε ˆ x and Q i 1 x are modified:

ε ˆ x = argmax ε x ∑ i = 1 n w i expit [ logit ( Q i 0 x ) + ε x ] Y i { 1 − expit [ logit ( Q i 0 x ) + ε x ] } 1 − Y i Q i 1 x = expit { logit [ E ˆ ( Y ∣ α Y , X = x , U i ) ] + ε ˆ x } .

Note that ε ˆ x corresponds to the intercept of a logistic regression of Y on X with an offset term logit ( Q 0 x ) . Using these, Δ ˆ α Y = E ˆ [ Y 1 ] − E ˆ [ Y 0 ] or Δ ˆ α Y = E ˆ [ Y 1 ] / E ˆ [ Y 0 ] .

3.3.2 Continuous outcome, continuous exposure

The causal effect we consider is the slope associated with a unit increase in X : Δ = E [ Y X + 1 ] − E [ Y X ] . The TMLE we use for estimating this causal effect is described in Chapter 22 of [37]. It assumes the following semiparametric regression model E [ Y ∣ X , U , α Y ] = Δ X + r ( U , α Y ) , where r ( U , α Y ) is a nonspecified function of the subset of U included in α Y . The coefficient β α Y in the linear regression (2) is used as an initial estimate for Δ ˆ α Y . This can be seen as approximating r ( U , α Y ) by ∑ m = 1 M α m Y δ m α Y U m in this first step. The fluctuation step then uses information from the exposure model to improve this initial estimate, in an attempt to reduce the potential residual bias that would be due to the misspecification of r ( U , α Y ) in the first step. This proceeds as follows:

ε ˆ = argmin ε ∑ i = 1 n [ y i − y ˆ i α Y − ε ( x i − x ˆ i α Y ) ] 2 Δ ˆ α Y = β α Y + ε ˆ ,

where y ˆ i α Y and x ˆ i α Y are the predicted values from the outcome and the exposure models, respectively. In practice, ε ˆ can be obtained as the coefficient estimate of a linear regression without intercept of Y on x i − x ˆ i α Y with y ˆ i α Y as an offset term.

3.3.3 Binary outcome, continuous exposure

Finally, when Y is binary and X is continuous, we consider the slope associated with a unit increase in X in a logistic working model logit [ E ( Y X ) ] = β 0 + β 1 X :

Δ α Y = argmax β 1 ∫ x ∑ i = 1 n f X ( x ) expit ( β 0 + β 1 x ) Y x [ 1 − expit ( β 0 + β 1 x ) ] 1 − Y x d x ,

where f X ( x ) is the marginal density of X assuming a normal density. To estimate our causal effect of interest, we extend the TMLE proposed in ref. [41], to the case where the exposure is continuous using the continuous exposure case efficient influence function (e.g., see ref. [42]). The working model defines the causal effect of interest as the best approximation, under the working model, to the true causal curve relating X to E [ Y X ] [41]. It does not constitute a modeling assumption. The TMLE algorithm we propose is inspired by the one in ref. [40] and can be summarized as follows:

Q i 0 = E ˆ [ Y ∣ α Y , X i , U i ] , w i = f X ( x i ) f X ( x i ∣ α Y , U i ) , ε ˆ = argmax ε ∑ i = 1 n w i expit [ logit ( Q i 0 x ) + ε 0 + ε 1 X i ] Y i { 1 − expit [ logit ( Q i 0 ) + ε 0 + ε 1 X i ] } 1 − Y i Q i 1 x = expit { logit [ E ˆ ( Y ∣ α Y , X = x , U i ) ] + ε ˆ 0 + ε ˆ 1 x } , Δ ˆ α Y = argmax β 1 ∫ x ∑ i = 1 n f X ( x ) expit ( β 0 + β 1 x ) Q i 1 x [ 1 − expit ( β 0 + β 1 x ) ] 1 − Q i 1 x d x .

The quantity ε ˆ can be found as the coefficients of a logistic regression of Y on X i with an offset term logit ( Q i 0 ) . The causal estimate Δ ˆ α Y can be obtained by first sampling x values randomly in f X ( x ) then fitting a logistic regression model of the associated updated predicted values Q i 1 x on the sampled x (thus approximating the integral over x using a Monte Carlo method).

3.4 Implementation

Section 3.3 has provided the theoretical ingredients required for estimating the causal effect Δ using GBCEE. This section provides an overview of the practical implementation of GBCEE. A detailed presentation is available in Appendix B. The proposed algorithm for sampling in the posterior distribution of Δ shares similarities with the one proposed by [5], but has been significantly revised to improve computational efficiency. It makes use of several approximations that are also used in the R package BMA for performing Bayesian model averaging [31].

Step 1. Determine the posterior distribution of the exposure model P ( α X ∣ X , U ) ∝ P ( X ∣ α X , U ) P ( α X ) , where P ( α X ) is a user-supplied prior distribution for the exposure model, for example, P ( α X ) ∝ 1 . Efficient algorithms for performing this task are readily available in usual software, for example, in the package BMA in R [31].

Step 2. Because it is generally impossible to consider each possible outcome models individually (each α Y ∈ A ), we propose using a Markov chain Monte Carlo model composition algorithm [43]. This algorithm generates a stochastic process that explores the outcome model space and yields simulation-consistent estimates under mild regularity conditions [43]. The Markov chain Monte Carlo model composition algorithm involves computing a ratio of the posterior probability of outcome models, which is computationaly expensive to perform exactly. We thus propose using a heuristic approximation to this ratio, similar to the one we proposed in our previous work [5]. This approximation makes the computation manageable and was observed to perform well in our current simulation study and those reported in ref. [5]. More details can be found in Appendix B.

Step 3. For each α Y explored, compute Δ ˆ α Y , Var ^ ( Δ ˆ α Y ) and P ( α Y ∣ O ) . The causal contrasts Δ ˆ α Y are estimated using the DR estimators presented in the previous section. An asymptotic estimator for the variance of these estimators is the sample variance of the empirical efficient influence function divided by n [37]. The efficient influence functions of the estimators can be found in their respective references given previously (see Section 3.3). While this variance estimator is computationally attractive, it is consistent only if both the exposure and the outcome models are correctly specified and is conservative if only the exposure model is correctly specified. An alternative solution is using the nonparametric bootstrap. This may be seen as computationally undesirable, but it is important to remark that the bootstrap is performed within, and not over, the GBCEE algorithm, thus reducing the computational cost. The posterior probability of each outcome model P ( α Y ∣ O ) is estimated as a by-product of the Markov chain Monte Carlo model composition algorithm.

Step 4. Compute the posterior expectation E ( Δ ∣ O ) = ∑ α Y Δ ˆ α Y P ( α Y ∣ O ) and the posterior variance Var ( Δ ∣ O ) = ∑ α Y [ Var ^ ( Δ ˆ α Y ) + ( Δ ˆ α Y ) 2 ] P ( α Y ∣ O ) − E ( Δ ∣ O ) 2 [30]. As can be seen, this equation accounts both for the within- and between-model variances. Using a maximum likelihood approximation, the posterior distribution of the model-specific Δ α Y are independent N ( Δ ˆ α Y , Var ^ ( Δ ˆ α Y ) ) . The posterior distribution of the marginal causal effect is thus N ( E ( Δ ∣ O ) , Var ( Δ ∣ O ) ) and a 95% credible interval for Δ is obtained as E ( Δ ∣ O ) ± 1.96 Var ( Δ ∣ O ) . This approach is in line with the one used for implementing Bayesian model averaging in the R package BMA [31].

4 Simulation study

4.1 Scenarios

We consider seven main scenarios where the exposure is binary and the outcome is continuous. Some of these scenarios are then adapted to other cases. These scenarios were chosen for their diverse data generating equations and because they provide various challenges for the confounder selection methods. We focus on the binary exposure – continuous outcome case since it is the case for which the most comparators are available.

Scenario 1 is inspired by ref. [5]. The data-generating equations are U 6 , … , U 40 ∼ i i d N ( 0 , 1 ) , U 1 , … , U 5 ∼ i i d N ( U 11 + … + U 15 , 1 ) , X ∼ Bernoulli ( p = expit ( U 11 + … + U 30 ) ) , and Y ∼ N ( X + 0.1 U 1 + … + 0.1 U 10 , 1 ) , where expit ( x ) = [ 1 + exp ( − x ) ] − 1 . Note that this scenario features conditional instruments ( U 11 , … , U 15 ), in addition to weak associations between the confounders and the outcome.

Scenario 2 is taken from ref. [4]. U 1 , … , U 20 were generated as multivariate normal variables with mean 0, unit variance, and 0.5 correlation, X ∼ Bernoulli ( p = expit ( U 1 + U 2 + U 5 + U 6 ) ) , and Y ∼ N ( 2 X + 0.6 U 1 + 0.6 U 2 + 0.6 U 3 + 0.6 U 4 , 1 ) . This scenario includes a moderate number of covariates with relatively large correlations.

Scenario 3 is taken from ref. [3] and includes a large number of covariates. The data were generated as U 1 , … , U 100 ∼ i i d N ( 1 , σ 2 = 4 ) , X ∼ Bernoulli ( p = expit ( 0.5 U 1 − U 2 + 0.3 U 5 − 0.3 U 6 + 0.3 U 7 − 0.3 U 8 ) ) , Y ∼ N ( X + 2 U 1 + 0.2 U 2 + 5 U 3 + 5 U 4 , σ 2 = 4 ) .

Scenarios 4 and 5 are taken from ref. [18]. In both scenarios, U 1 , … , U 5 were generated as multivariate normal variables with mean 1, unit variance, and 0.6 correlations. In Scenario 4, X ∼ Bernoulli ( p = expit ( 0.5 U 1 + 0.5 U 2 + 0.1 U 3 ) ) and Y ∼ N ( X + U 3 + U 4 + U 5 + ∑ i = 1 5 ∑ j = 1 5 0.5 U i U j , 1 ) . In Scenario 5, X ∼ Bernoulli ( p = expit ( − 5 + U 3 + U 4 + U 5 + ∑ i = 1 5 ∑ j = 1 5 0.5 U i U j ) ) , and Y ∼ N ( X + 0.5 U 1 + 0.5 U 2 + 0.1 U 3 , 1 ) . These scenarios present nonlinear relationships between covariates and either the exposure or the outcome.

Scenario 6 features multiple weak confounders. U 1 , … , U 30 were generated as multivariate normal variables with mean 0, unit variance, and 0.2 correlation, X ∼ Bernoulli ( p = expit ( 0.05 ∑ m = 6 30 U m ) ) , and Y ∼ N ( 2 X + 0.05 ∑ m = 1 25 U m , 1 ) .

Scenario 7 includes an exposure–covariate interaction. U 1 , … , U 5 were generated as multivariate normal variables with mean 0, unit variance, and 0.6 correlation, X ∼ Bernoulli ( p = expit ( 0.5 U 1 + 0.5 U 2 + 0.5 U 3 ) ) and Y ∼ N ( 2 X + U 3 + U 4 + U 5 + X × U 3 , 1 ) .

Scenarios 1, 2, 4, and 5 were adapted to the binary outcome case (Scenarios 1B, 2B, 4B, and 5B, respectively). Only the equations used for generating Y were changed; they were, respectively, Y ∼ Bernoulli ( p = expit ( X + 0.1 U 1 + … + 0.1 U 10 ) ) , Y ∼ Bernoulli ( p = expit ( 2 X + 0.6 U 1 + 0.6 U 2 + 0.6 U 3 + 0.6 U 4 ) ) , Y ∼ Bernoulli ( p = expit ( − 5 + X + U 3 + U 4 + U 5 + ∑ i = 1 5 ∑ j = 1 5 0.5 U i U j ) ) , and Y ∼ Bernoulli ( p = expit ( X + 0.5 U 1 + 0.5 U 2 + 0.1 U 3 ) ) . Scenarios 2 and 2B were additionally adapted to the case where the exposure is continuous (Scenarios 2C and 2D, respectively). Only the equation for generating X was changed to be X ∼ N ( U 1 + U 2 + U 5 + U 6 , 1 ) . We also explored the impact of misspecifying the outcome model error distribution in Scenario 2E by generating Y as a shifted exponential random variable with mean 2 X + 0.6 U 1 + 0.6 U 2 + 0.6 U 3 + 0.6 U 4 and rate equal 1, keeping the other equations as in Scenario 2.

In Scenarios 1–7, 2C, and 2E, both a sample size of n = 200 and n = 1,000 were considered. In Scenarios 1B, 2B, 4B, 5B, and 2D, only a sample size of n = 1,000 was considered because of frequent convergence problems with n = 200 , due to too few events relative to the number of variables.

4.2 Analysis

For each scenario, 1,000 datasets were generated. The estimand of interest was the average treatment effect Δ = E [ Y 1 ] − E [ Y 0 ] when the exposure was binary or the exposure effect slope when the exposure was continuous. Only estimators aiming to estimate these estimands were considered in the simulations to allow meaningful comparisons. In Scenarios 1–7 and 2E, Δ was estimated with GBCEE setting ω = 500 n , C-TMLE, OAL, GLiDeR, BAC, MADR, BP, HDM, and HD-SSL. For OAL, we considered two different implementations, one corresponding to the original proposal by Shortreed and Ertefaie [4] where adjustment is made using inverse probability weighting [4], and a second implementation where adjustment is made using TMLE as previously done by Ju et al. [19] (OAL-TMLE). For Scenarios 1B, 2B, 4B, and 5B, only GBCEE, OAL, OAL-TMLE, C-TMLE, and BAC were considered. In Scenario 2C, only GBCEE, BAC, and BP were compared. In Scenario 2D, only GBCEE was examined. Different implementations of GBCEE were also compared in Scenarios 1–7, by setting ω = 100 n or ω = 1,000 n , or by selecting only the model with the largest posterior probability (i.e., without model averaging).

As benchmarks, in all scenarios, we have additionally estimated Δ using the parametric g-formula Δ ˆ g = 1 n ∑ i = 1 n ( E ˆ [ Y ∣ X = 1 , U i ] − E ˆ [ Y ∣ X = 0 , U i ] ) and with a DR estimator. When the exposure was binary, we used the AIPW estimator as a benchmark:

Δ ˆ AIPW = 1 n ∑ i = 1 n X i P ˆ ( X = 1 ∣ U i ) + 1 − X i 1 − P ˆ ( X = 1 ∣ U i ) [ Y i − E ˆ ( Y ∣ X i , U i ) ] + E ˆ ( Y ∣ X = 1 , U i ) − E ˆ ( Y ∣ X = 0 , U i ) ,

where E ˆ [ Y ∣ X , U ] was obtained from a linear regression model in Scenarios 1–7 and a logistic regression model in Scenarios 1B, 2B, 4B, and 5B, P ˆ ( X = 1 ∣ U i ) was produced by a logistic regression model. When the exposure was continuous, we used the DR estimators described in Sections 3.3.2 and 3.3.3. The regression models were either adjusted for all potential confounders (full-g and full-DR, respectively) or only for the pure outcome predictors and true confounders (target-g and target-DR, respectively). The target adjustment set was { U 1 , … , U 10 } in Scenarios 1 and 1B; { U 1 , … , U 4 } in Scenarios 2, 2B, 2C, 2D, 2E, and 3; { U 1 , … , U 5 } in Scenario 4 and 4B; { U 1 , U 2 , U 3 } in Scenarios 5 and 5B; { U 1 , … , U 25 } in Scenario 6; and { U 1 , U 2 , U 3 } in Scenario 7. In real data analyses, the target adjustment set would typically be unknown.

For all methods, only main terms of the covariates were considered, except in Scenario 7 where an X × U 3 term was considered in the outcome model for target-g, full-g, target-DR, full-DR, and BAC. All other methods, except HD-SSL, do not allow including exposure–covariate interaction terms. HD-SSL allows “heterogeneous” effect estimation, but the function returned an error message and failed to produce results in most iterations. Consequently, we considered the “homogeneous” HD-SSL, even in Scenario 7. Since Scenarios 4, 4B, 5, 5B, and 7 feature nonlinear and interaction terms, this allows evaluating the robustness of methods to model misspecifications. In Scenarios 4 and 4B, the correct outcome model is not within the space of considered outcome models for any of the methods. Similarly, in Scenarios 5 and 5B, the correct exposure model is not within the space of considered models for any of the methods. In Scenario 7, the correct outcome model is not considered for any of the methods, except BAC. Recall that BAC, BP, HDM, and HD-SSL only model the outcome, OAL only models the exposure, and GBCEE, OAL-TMLE, C-TMLE, GliDeR, and MADR are DR methods that model both the outcome and the treatment. As such, poor performances are expected for methods that model only the outcome or only the exposure when the correct model is not within the space of considered model. On the other hand, DR methods are expected to retain good performances.

Most methods were implemented using the default options of their respective R function or package (see Table 1). For BAC, we used 500 burn-in iterations, followed by 5,000 iterations with a thinning of 5. For HD-SSL, we used 20,000 MCMC scans, 10,000 burn-in iterations, and a thinning of 10 (as in an example of its vignette). For GBCEE, we performed 2,000 iterations of the Markov chain Monte Carlo model composition algorithm. OAL and OAL-TMLE were implemented using a modified version of the code supplied in ref. [4] since the original code was not compatible with the latest version of R. All methods employed linear or logistic regression models for the outcome or exposure models. GBCEE, GLiDeR, OAL, OAL-TMLE, BAC, MADR, and HD-SSL further used parametric likelihoods for performing the model selection.

To evaluate the presence of structural positivity violations in each of Scenarios 1–7, we estimated P ( X = 1 ∣ Z ) in a sample of size n = 1,000,000 observations, taking Z as either the true direct causes of X , or as the target adjustment set. Boxplots of P ˆ ( X = 1 ∣ Z ) were produced by exposure group. A poor overlap of the boxplots is indicative of structural positivity violations.

For each method, we computed the bias as the difference between the average of the estimates and the true effect. In Scenarios 1–7, 2C, and 2E, the true effect corresponds to the coefficient associated with X in the data-generating equations. In Scenarios 1B, 2B, 2D, 4B, and 5B, the true effect was estimated using Monte Carlo simulations because the true effect cannot be easily determined analytically from the data-generating equations. A sample of n = 1,000,000 observations was generated in which the counterfactual outcomes were simulated for each observation, and the true effect was estimated using these counterfactual outcomes. The true effects were approximately 0.1894, 0.2814, 1.3449, 0.0229, and 0.1409, in Scenarios 1B, 2B, 2D, 4B, and 5B, respectively. The standard deviation (SD) of the estimates, the mean of the estimated standard error (ESE), the mean squared error, and the proportion of simulation replicates in which the 95% confidence intervals included the true effect (CP) were also computed. The ratio of the root mean squared error of each method over the root mean squared error of target-g (Rel. RMSE) was also computed, except in Scenarios 4 and 4B where the comparator was target-DR, since the g-formula estimator was biased. For OAL, OAL-TMLE, GLiDeR, and MADR, confidence intervals were computed using 1,000 nonparametric bootstrap replicates with the percentile method. Because this procedure is very computationally expensive, this was only done in Scenarios 4 and 5. In addition, we did not compute confidence intervals or the ESE for the g-formula estimators in Scenarios 1B, 2B, 4B, 5B, and 7 since no simple variance estimator is available. The ESE is also not reported for OAL, OAL-TMLE, GLiDeR, and MADR since there is no analytical standard error estimator for these methods, and their confidence intervals were computed using the percentile bootstrap method, which does not require estimating the standard error. To estimate the within-model variance of GBCEE and the variance of benchmark DR estimators, we used the variance of the empirical efficient influence function divided by n . The probability of inclusion of each covariate was calculated in Scenarios 1–7 for all methods, except HDM and C-TMLE whose R function does not provide this information.

4.3 Results

According to Figure 1, the positivity assumption would be violated in Scenarios 1, 2, 3, and 5 when considering the true exposure model. Considering the target adjustment set reduces positivity issues in all scenarios: the positivity assumption becomes verified in Scenario 1, mild issues remain in Scenario 2 and 3, but a severe violation persists in Scenario 5.

$Figure 1 Boxplots of P ˆ ( X = 1 ∣ Z ) \hat{P}\left(X=1| {\boldsymbol{Z}}) according to X X , taking either Z {\boldsymbol{Z}} as the true causes of exposure or as the target adjustment set.$

Figure 1

Boxplots of P ˆ ( X = 1 ∣ Z ) according to X , taking either Z as the true causes of exposure or as the target adjustment set.

The results of the simulation study for Scenarios 1–7 and n = 1,000 are reported in Table 2. The other results are overall similar and are presented in Appendix C, Tables A4–A10. The results comparing the different implementations of GBCEE are also reported in Appendix C, Tables A11 and A12. The latter results support that the choice of c in [ 100 , 1,000 ] has little impact on GBCEE’s performance. In addition, these results confirm the importance of performing model averaging in GBCEE to deal with the post-selection inference problem, since the standard error is underestimated when only the model with the largest posterior weight is selected.

Table 2

Results of simulation Scenarios 1–7, with n = 1,000

Scenario 1						Scenario 2					Scenario 3
				Rel.					Rel.					Rel.
Method	Bias	SD	ESE	RMSE	CP	Bias	SD	ESE	RMSE	CP	Bias	SD	ESE	RMSE	CP
Full-g	0.00	0.09	0.10	1.41	0.96	0.00	0.09	0.09	1.06	0.95	0.01	0.18	0.18	1.14	0.95
Target-g	0.00	0.06	0.07	1.00	0.96	0.00	0.08	0.08	1.00	0.95	0.00	0.16	0.16	1.00	0.95
Full-DR	0.03	0.65	0.32	10.11	0.94	− 0.01	0.45	0.22	5.40	0.94	0.00	0.98	0.47	6.12	0.95
Target-DR	0.00	0.06	0.07	1.00	0.95	0.00	0.08	0.06	1.00	0.86	0.00	0.16	0.13	1.00	0.89
GBCEE	0.00	0.08	0.07	1.17	0.95	0.00	0.14	0.13	1.65	0.94	0.00	0.20	0.19	1.29	0.94
OAL	0.02	0.08	.	1.34	.	0.24	0.28	.	4.40	.	− 0.34	0.30	.	2.84	.
OAL-TMLE	0.00	0.08	.	1.20	.	0.00	0.14	.	1.71	.	− 0.34	0.25	.	2.68	.
C-TMLE	0.01	0.10	0.06	1.50	0.87	0.02	0.11	0.07	1.39	0.77	− 0.00	0.28	0.14	1.78	0.65
GLiDeR	0.01	0.08	.	1.18	.	0.00	0.12	.	1.38	.	− 0.10	0.21	.	1.45	.
BAC	0.00	0.09	0.09	1.42	0.96	0.00	0.09	0.09	1.06	0.95	0.00	0.18	0.17	1.11	0.92
MADR	0.01	0.07	.	1.05	.	0.00	0.15	.	1.75	.	0.00	0.22	.	1.41	.
BP	0.00	0.09	0.10	1.42	0.95	0.00	0.09	0.09	1.06	0.95	0.00	0.18	0.17	1.13	0.94
HDM	0.00	0.09	0.10	1.41	0.96	0.00	0.09	0.09	1.06	0.95	0.00	0.18	0.17	1.10	0.94
HD-SSL	0.01	0.08	0.08	1.21	0.95	0.00	0.09	0.09	1.03	0.96	− 0.01	0.17	0.17	1.05	0.95

Scenario 4						Scenario 5					Scenario 6
Full-g	− 4.63	1.67	1.81	1.96	0.26	0.00	0.10	0.10	1.02	0.95	− 0.00	0.07	0.07	1.00	0.95
Target-g	− 4.63	1.67	1.81	1.96	0.26	0.00	0.09	0.10	1.00	0.95	− 0.00	0.07	0.07	1.00	0.95
Full-DR	− 0.01	2.51	2.29	1.00	0.93	0.00	0.13	0.12	1.41	0.92	− 0.00	0.07	0.07	1.02	0.95
Target-DR	− 0.01	2.51	2.29	1.00	0.93	0.00	0.09	0.06	1.00	0.76	− 0.00	0.07	0.07	1.00	0.96
GBCEE	0.07	2.51	2.33	1.00	0.91	0.00	0.11	0.09	1.20	0.88	0.02	0.07	0.07	1.08	0.93
OAL	− 0.16	5.97	.	2.38	0.89	0.12	0.23	.	2.76	0.94	0.04	0.07	.	1.29	.
OAL-TMLE	− 0.11	2.51	.	1.00	0.90	0.02	0.12	.	1.25	0.96	0.04	0.07	.	1.29	.
C-TMLE	2.45	1.10	1.13	1.07	0.39	0.02	0.17	0.08	1.78	0.72	0.01	0.07	0.06	1.03	0.92
GLiDeR	0.01	3.34	.	1.33	0.89	0.02	0.13	.	1.41	0.95	0.02	0.07	.	1.05	.
BAC	− 4.66	1.67	1.81	1.97	0.24	0.00	0.10	0.10	1.02	0.95	0.03	0.07	0.06	1.09	0.90
MADR	0.02	3.22	.	1.28	0.90	0.03	0.26	.	2.82	0.95	0.03	0.07	.	1.15	.
BP	− 4.63	1.67	1.81	1.96	0.26	0.00	0.10	0.10	1.02	0.95	− 0.00	0.07	0.07	1.00	0.95
HDM	− 4.63	1.67	1.67	1.96	0.20	0.00	0.10	0.10	1.02	0.95	− 0.01	0.07	0.07	1.02	0.94
HD-SSL	− 4.53	1.61	1.81	1.92	0.26	0.01	0.10	0.10	1.09	0.92	0.01	0.07	0.07	1.02	0.94

Scenario 7
Full-g	0.00	0.09	.	1.02	.
Target-g	0.00	0.08	.	1.00	.
Full-DR	0.00	0.09	0.09	1.10	0.94
Target-DR	0.00	0.08	0.08	1.02	0.94
GBCEE	− 0.01	0.09	0.10	1.07	0.96
OAL	0.03	0.12	.	1.45	.
OAL-TMLE	− 0.01	0.09	.	1.10	.
C-TMLE	− 0.01	0.09	0.09	1.06	0.96
GLiDeR	− 0.02	0.09	.	1.10	.
BAC	0.00	0.08	0.08	0.98	0.96
MADR	− 0.02	0.09	.	1.11	.
BP	− 0.27	0.09	0.08	3.36	0.12
HDM	− 0.27	0.09	0.09	3.36	0.15
HD-SSL	− 0.26	0.10	0.09	3.36	0.15

Conditional instruments are present in Scenario 1, covariates are strongly correlated in Scenario 2, a large number of covariates are present in Scenario 3, the true outcome model includes nonlinear terms in Scenario 4, the true exposure model includes nonlinear terms in Scenario 5, a large number of weak confounders are present in scenario 6, and an exposure–covariate interaction is present in Scenario 7.

First, we notice that the lowest bias, variance, and RMSE are generally achieved by both target-g and target-DR, except in Scenario 4 where the g-formula performs poorly because of the outcome model misspecification. However, recall that these estimators are only considered as benchmarks. Second, we remark that the variance and RMSE of full-g and full-DR are greater than their target counterparts. The increase in the variance and RMSE is particularly pronounced for the DR benchmark estimator. These results are important to keep in mind when considering those for the variable selection methods: the double robustness property, which protects against bias due to model misspecification, may come at the cost of increased variance when spurious variables are included. This is likely due, at least partly, to the sensitivity of DR methods to positivity violations.

Because BAC, BP, HDM, and HD-SSL are outcome modeling-based variable selection methods, it is not surprising that they generally perform better than the DR variable selection methods (GBCEE, OAL-TMLE, C-TMLE, GLiDeR, and MADR) in Scenarios 1–3, 5, and 6, where the correct outcome model is among the considered models. However, substantial bias is observed for BAC, BP, HDM, and HD-SSL in Scenario 4, and for BP, HDM, and HD-SSL in Scenario 7, that is, when the outcome model is not among the considered models. In addition, BAC, BP, and HDM offer no substantial RMSE reduction as compared to full-g when n = 1,000 , sometimes even performing worse. Some variance and RMSE reductions were however observed when n = 200 (see Appendix C). Moreover, in Scenario 2C, where the exposure and the outcome were continuous, GBCEE performed better than BAC, BP, and HD-SSL (the only comparators available, see Appendix C). HD-SSL was the only outcome modeling-based method that reduced the variance and the RMSE as compared to full-g in multiple scenarios both when n = 200 and n = 1,000 . OAL, which is an exposure-modeling based method, performed poorly compared to all other variable selection methods, having larger bias and/or variance, except in Scenario 1.

When comparing together the DR methods, we first notice that GBCEE, GLiDeR, and MADR yield estimates with little or no bias in all scenarios when n = 1,000 , even when the correct outcome or exposure model is not within the space of considered models (Scenarios 4, 5, and 7). At most, a bias of − 0.1 was observed for GLiDeR in Scenario 4. Unexpectedly, C-TMLE yields results with substantial bias in Scenario 4 where the correct outcome model is not available for selection, although the bias is much smaller than that of the outcome-modeling methods (BAC, BP, HDM, and HD-SSL). OAL-TMLE produced biased estimates in Scenarios 3. When n = 200 , larger biases were observed in Scenarios 3 and 4 for all methods. In terms of RMSE, GBCEE, OAL-TMLE, GLiDeR, MADR, and C-TMLE all yield improvement as compared to full-DR, except in Scenarios 4, 6, and 7. In all three of those scenarios, the RMSE of target-DR is similar to that of full-DR. In particular, in Scenario 4, the target adjustment set includes all covariates. In such a case, no variance reduction can be expected through variable selection. As opposed to GBCEE, OAL-TMLE and GLiDeR, MADR, and C-TMLE have a much larger RMSE than full-DR in Scenario 5 with n = 1,000 . The relative performance of the methods is variable; no method is consistently performing better than all others across scenarios and sample sizes. For n = 1,000 , the two lowest RMSE among DR variable selection methods are achieved, respectively, by MADR and GBCEE in Scenario 1; GLiDeR and C-TMLE in Scenario 2; GBCEE and MADR in Scenario 3; GBCEE and OAL-TMLE in Scenario 4; GBCEE and OAL-TMLE in Scenario 5; C-TMLE and GLiDeR in Scenario 6; and C-TMLE and GBCEE in Scenario 7.

The coverage probabilities of 95% confidence intervals for BP, HDM, and HD-SSL are close to their nominal level in all scenarios, except Scenario 4 and 7 where the correct outcome model is not among the considered models. Similarly, the coverage of BAC is good in all scenarios except Scenario 4. The coverage probability was much lower than 95% in all scenarios for C-TMLE. This is because the estimated standard error largely underestimates the true standard error, except in Scenario 4, where bias is responsible for the undercoverage. For GBCEE, the coverage was close to 95% in all scenarios, except in Scenario 5 where the coverage probability was 87%. This is likely due to the inconsistency of the empirical efficient influence function variance estimator when the exposure model is misspecified, as evidenced by the absence of notable bias and a slight, but notable, underestimation of the standard error. To verify this assumption, we ran again the simulation for GBCEE using a nonparametric bootstrap variance estimator with 200 replicates to estimate the within-model variance. A 94% coverage was then achieved. Similarly, in scenarios with n = 200 , poorer coverage probabilities are obtained with GBCEE (see Appendix C). This is most likely because of the asymptotic nature of the efficient influence function variance estimator, since poor coverage probabilities are also obtained with target-DR, and the estimated standard error is noticeably lower than the Monte Carlo standard deviation. As mentioned earlier, the coverage of OAL, OAL-TMLE, GLiDeR, and MADR were only examined in Scenarios 4 and 5 due to the associated computational burden. In Scenario 4, coverages of 89–90% were observed, whereas the coverages were 93–96% in Scenario 5.

Plots of the inclusion probability of covariates are available in Appendix D. These figures reveal that GBCEE, OAL, and GLiDeR have relatively similar behaviors, having large probabilities of including both true confounders and outcome risk factors, and lower probabilities of including the other variables. MADR also has a similar behavior regarding the inclusion of variables in the outcome model, but lower overall probabilities of inclusion of variables in the exposure model. HD-SSL also had a somewhat similar behavior, but with much lower probabilities of inclusion of all variables. In general, the variables selected were closer to the target for n = 1,000 than for n = 200 . BAC and BP have high probabilities of including true confounders, outcome risk factors, and instruments.

The results of two additional simulation scenarios are worth highlighting. In Scenario 2D, where the exposure is continuous and the outcome is binary, target-DR, full-DR, and GBCEE have a small but notable bias, and their mean estimated standard error is much larger than the standard deviation of the estimates. The latter seems to be due to extreme values in the estimated standard error since the median of the estimated standard error is smaller than the standard deviation of the estimates (result not shown). The coverage of their confidence intervals is also poor. We hypothesize that this is due to the strong relation between confounders and exposure that can cause the weighting component of the DR estimator to take extreme values. This scenario showcases the challenge of using DR estimators with continuous exposures. The other additional scenario whose results are particularly noteworthy is Scenario 2E, where the outcome error distribution is exponential. The results of this scenario were very similar to those of Scenario 2, even for methods that assumed a normal distribution for the outcome (BAC, MADR, GLiDeR, and GBCEE).

We evaluated the computational time of the variable selection methods in one replication of Scenario 1 with n = 1,000 on a PC with 4 GHz and 16 Gb RAM. The running time for producing both the estimate and confidence interval was 25.8 s for GBCEE using the efficient influence function variance estimator and 40.2 min when using 200 bootstrap replicates for estimating the variance, 1 min for C-TMLE, 1.6 min for BAC, 0.6 s for BP, 2 s for HDM, and 4.2 min for HD-SSL. The execution time for obtaining point estimates only was 0.2 s for OAL, 8.7 s for GLiDeR, and 1.1 min for MADR. Obtaining the confidence interval required 3 min for OAL, 39.8 min for GLiDeR, and 19.9 h for MADR, using 1,000 bootstrap replicates.

5 Application for estimating the effect of physical activity on the risk of fractures

Osteoporosis is a disease where bones become fragile because of low bone mineral density or because of the deterioration of bone architecture. It is a common disease with a prevalence of 5% in men and 25% in women aged 65 years and older from the United States [44]. The prevalence of osteoporosis increases with age. Because of the bone fragility, individuals with osteoporosis are at higher risk of fractures. In addition to important pain, osteoporotic fractures may also induce a loss of mobility and quality of life, additional morbidities, and premature mortality. Preventing osteoporosis and the related fractures has thus become a public health priority. Physical activity may help prevent osteoporosis and osteoporotic fractures. Indeed, regular physical activity is associated with increased bone mineral density and lower risk of falls [45,46].

To illustrate the use of GBCEE, we estimated the effect of attaining the physical activity recommendations from the World Health Organization on the 5-year risk of fracture in older women using the publicly available limited data set from the study of osteoporotic fractures [47]. The study of osteoporotic fractures is a multicentric population based cohort study that recruited women aged 65 or older in four urban regions in the United states: Baltimore, Pittsburgh, Minneapolis, and Portland. An ethical exemption was obtained from the CHU de Québec – Université Laval Ethics Board (# 2020-4788).

In this application, we consider data on 9,671 white women enrolled in 1986. Subjects were considered as exposed if they spent 7.5 kilocalories per kilogram of body mass from moderate or high intensity physical activity per week, and unexposed otherwise. The outcome of interest was the occurrence of any hip or upper leg fracture in the 5 years following the baseline interview. A very rich set of 55 measured potential confounders were identified based on substantive knowledge and notably include data on age, ethnic origin, body mass index, smoking, alcohol use, education, various drug use, fall history, self and familial history of fracture, physical activity history when a teenager and during adulthood, fear of falling, and several health conditions (see Appendix E, Table A13).

Appendix E presents a comparison of the baseline characteristics of participants according to exposure to physical activity recommendations. Among others, subjects who were unexposed had a greater body mass index, were older, had less years of education, were more afraid of falling, practiced less physical activity in the past, drank less alcohol, had more difficulty walking, rated their overall health more poorly, and were less likely to still be married.

Only 3,458 participants had complete information for all variables. Most variables had only few missing data ( < 10 % ), except for parental history of fracture (mother = 24%, father = 37%) and years since menopause (18%). Missing data were imputed using chained equations with the mice package in R. To simplify this illustration, a single random imputation was performed, but multiple imputations would be preferable in practice. The effect was estimated with AIPW adjusting for all potential confounders and GBCEE. Variances (AIPW) and within-model variances (GBCEE) were estimated using 50 nonparametric bootstrap replicates.

The fully adjusted AIPW yielded a 5-year risk difference of fracture of − 1.1 % (95% confidence interval: − 4.6 % , 2.4%) between subjects that attained the physical activity recommendations and those who did not. The estimate using GBCEE, with ω = 500 n and 20,000 iterations, was − 0.9 % (95% confidence interval: − 2.3 % , 0.4%). In this illustration, GBCEE yields a major decrease in the width of the 95% confidence interval compared to the fully adjusted estimate (61%). The potential confounders with the largest probability of being selected by GBCEE were as follows: mother history of fracture (100%), self-rated health (100%), difficulty to walk 2–3 blocs (100%), the presence of fracture before age 50 (100%), education (100%), age (100%), body mass index (100%), short mini mental status exam (80%), any drinking in past year (73%), and osteoporosis diagnostic (72%). All other covariates had a probability of being selected < 50 % , with most being null.

6 Discussion

We have presented a generalization of the BCEE algorithm to estimate the causal effect of a binary or continuous exposure on a binary or continuous outcome using DR targeted maximum likelihood estimators. Like the original BCEE, GBCEE is theoretically motivated using the graphical framework to causal inference. It aims to select adjustment covariates such that the final estimator is unbiased and has reduced variance as compared to an estimator adjusting for all covariates. In addition, the Bayesian framework allows producing inferences that account for the model selection step in a principled way. We have also proposed an implementation of GBCEE that is more computationally efficient than that of BCEE.

We have compared GBCEE to alternative model selection methods in a simulation study. GBCEE produced estimates with little or no bias in all scenarios, whether the exposure and outcome were continuous or binary, and as long as either a correctly specified exposure or outcome model was available for selection. In addition, GBCEE’s estimates had improved precision as compared to a fully adjusted DR estimator in most scenarios. The only situation where no improvement in precision was observed was when unbiased estimation required adjusting for all potential confounders; hence, no variance reduction was possible. The good performance of GBCEE across simulation scenarios, even with a relatively small sample size of n = 200 , also serves as evidence that the different approximations we have made for the practical implementation of GBCEE have little impact on the validity of the results.

The relative performance of the estimators was observed to depend on both the data-generating mechanism and sample size, but GBCEE often had the lowest or second lowest RMSE among DR methods. In large samples, GBCEE, OAL, and GLiDeR are all expected to select true confounders and outcome risk factors. MADR is also expected to select such variables in its outcome model, but only true confounders in the exposure model. This difference perhaps explains why MADR performed more poorly than GBCEE, OAL-TMLE, and GLiDeR in simulation Scenario 5. In a simulation study with scenarios inspired by large healthcare databases and with binary outcomes, GBCEE was one of the best performing approach in terms of bias and RMSE, together with BAC and Scalable C-TMLE, whereas GLiDeR performed very poorly in some scenarios [48]. It might be interesting for future studies to specifically investigate the factors that affect the efficiency of the different model selection methods and to provide insights regarding the situations where one method should be expected to outperform the others. Our simulation study results also indicated that outcome model-based algorithms outperformed DR methods in scenarios where the correct outcome model was within the considered model space. However, the former methods produced estimates with high bias when the correct outcome model was not in the model space, unlike the latter. Model misspecifications are likely to be common in practice, and using DR approaches may thus be preferred. Although we did not consider BCEE in our simulation study, since it is an outcome model-based algorithm, BCEE is sensitive to outcome model misspecifications. GBCEE, like most other methods we considered, employs parametric likelihoods for performing the variable selection. In a simulation scenario where the outcome error distribution was misspecified, we observed that results were practically unaffected for all methods. Moreover, some theoretical results indicate that Bayesian posteriors asymptotically concentrate on the model closest to the truth (according to the Kullback-Leibler divergence) when none of the considered model is correctly specified [49,50]. This suggests that even if none of the models considered is correctly specified, GBCEE’s model selection should converge to the best available approximation. Our simulation study also showcased the challenge of DR estimation with a continuous exposure. If DR estimation of the effect of a continuous exposure is attempted in practice, we warn analysts to be cautious of the potential influence of extreme weights on their results.

From a computational perspective, GBCEE produced both an estimate and inferences more quickly than the other DR algorithms when the empirical influence function based within-model variance estimator was used. This variance estimator produced valid inferences in most scenarios. Two exceptions were when the exposure model was misspecified and when the sample size was small. As such, it is advisable to use the bootstrap within-model variance estimator for inferences when the sample size is moderate or small. In such situations, using the bootstrap variance estimator is unlikely to be computationally prohibitive. The increased computational efficiency of GBCEE is particularly relevant when comparing with MADR, since GBCEE fits within the same general MADR framework that was developed by Cefalu et al. [17].

Regarding inferences for model selection procedures, it is unclear if employing the usual nonparametric bootstrap is appropriate. Indeed, model selection may yield estimators that lack the smoothness required for the bootstrap to be valid. This may be the reason why the coverage of 95% confidence intervals was around 90% for MADR and GLiDeR in Scenario 4 of our simulation study. A smoothed-bootstrap method has been proposed specifically for the variable selection problem [51] and was used by Shortreed and Erterfaie [4]. We have explored using this approach in our simulation study, but we have observed that a very large number of replications (5,000–10,000) was necessary to achieve appropriate variance estimation. Shortreed and Ertefaie [4] based their inferences on 10,000 replications. Overall, we believe that inference procedures for variable selection methods require further investigation. We note that GBCEE does not share this limitation when the bootstrap variance estimator is used, since the bootstrap is only used to estimate the within-model variance; the between-model variance remains estimated using the Bayesian machinery.

We have also illustrated the use of GBCEE for estimating the effect of reaching physical activity recommendations on the risk of osteoporotic fractures among older women. In this example, the estimate produced by GBCEE was similar to that of the fully adjusted DR estimator, but the confidence interval was much narrower. Since resources available for research are limited, it is important to make the most out of the available data. Confounder selection methods should be considered as a valuable tool for achieving this, especially when there are multiple potential confounders or when it is unknown whether some covariates are confounders, risk factors for the outcome, or instruments.

Model selection in causal inference is a rapidly growing field. It was thus impossible for us to compare our proposed method to all possible alternatives in the simulation study. For example, Antonelli et al. have introduced a general framework for Bayesian DR estimation of causal effects with a focus on high-dimensional settings [28]. We considered including their method in our simulation study, but we ran into multiple errors when trying to use their R package. However, we hypothesize that GBCEE is more efficient since it connects the exposure and outcome modeling steps to improve variable selection, contrary to Antonelli et al.’s proposal. We could not include the OHAL either in our simulation study because of the excessive computing time required to obtain even only point estimates for this method in the scenarios we considered.

There are multiple directions in which the current work could be extended. For instance, to the best of our knowledge, no confounder selection method is applicable to the censored time-to-event outcome case yet. To reduce the risk of model misspecifications, it would also be possible to employ machine learning within GBCEE. In fact, TMLE is often paired with Super Learner, an ensemble method that combines the predictions from multiple methods using a cross-validation procedure [37]. A simple possibility for doing so would entail computing the prior and posterior probability of a given adjustment set α Y using parametric working models featuring only main terms, exactly as proposed in the current article. However, the causal contrast estimate Δ ˆ α Y would be computed using a TMLE that employs machine learning allowing for potential nonlinear or interaction terms. For such a procedure to perform well, the working models need to be sufficiently good approximations to determine the respective role of each covariate (e.g., confounders, instruments or conditional instruments, outcome risk factors). As such, we would expect this procedure to perform well in practice as long as confounders feature a non negligible main linear effect on both the exposure and the outcome. However, an important downside to such a procedure would be its computational burden. Using richer parametric models, such as generalized additive models for both the posterior probability computations and causal effect estimation, is another possible option. However, recall that the prior distribution we proposed evaluates the association between each covariate and the outcome using its associated regression coefficient in the outcome model. As such, using generalized additive models for the variable selection steps would require modifying the specification of the prior distribution to accommodate that multiple regression coefficients could encode the association between each covariate and the outcome.

Funding information: This work was supported by grants from the Natural Sciences and Engineering Research Council of Canada and the Fonds de recherche du Québec - Santé. D Talbot is a Fonds de recherche du Québec - Santé Chercheur-Boursier. The Study of Osteoporotic Fractures (SOF) was supported by National Institutes of Health funding. The National Institute on Aging (NIA) provided support under the following grant numbers: R01 AG005407, R01 AR35582, R01 AR35583, R01 AR35584, R01 AG005394, R01 AG027574, and R01 AG027576.
Conflict of interest: The authors state no conflict of interest.
Data availability statement: The R code required to reproduce the simulation study is available at https://github.com/detal9/GBCEE. The public access data from the Study of Osteoporotic Fractures is available at https://sofonline.ucsf.edu/.

Appendix A Asymptotic properties of the maximum likelihood estimation of the prior’s parameters

We provide greater details on the properties of the estimated prior distribution. Denote by δ ˜ ˆ m the maximum likelihood estimator of δ ˜ m . Notice that δ ˜ ˆ m α Y σ U m σ Y 2 → 0 at rate 1 / n when δ ˜ m = 0 , whereas δ ˜ ˆ m α Y σ U m σ Y 2 converges to a nonzero finite constant at rate 1 / n when δ ˜ m ≠ 0 . As a result, the proposed choice of ω ( ω = c × n b , with 0 < b < 1 ) ensures that ω m α Y → 0 at rate 1 / n 1 − b when δ ˜ m = 0 and that ω m α Y → ∞ at rate n b when δ ˜ m ≠ 0 . This, in turns, ensures that P α Y ∣ α X ( α m Y = 1 ∣ α m X = 1 ) → 0 when δ ˜ m = 0 and that P α Y ∣ α X ( α m Y = 1 ∣ α m X = 1 ) → 1 when δ ˜ m ≠ 0 . Hence, α Y has a zero prior probability asymptotically whenever a covariate U m that is associated with both the exposure ( α m X = 1 ) and the outcome ( δ ˜ m ≠ 0 ) is excluded, or when a covariate U m that is only associated with the exposure is included. Taking b = 0.5 is an intuitive choice, since it makes both convergence rates equal.

B Detailed presentation of GBCEE’s implementation

In this Appendix, we provide further details concerning Steps 2 and 3 of our proposed implementation of GBCEE.

Step 2. Use a Markov chain Monte Carlo model composition algorithm to explore the outcome model space α Y [43]. This algorithm starts with an initial outcome model. At each step of the algorithm, a candidate model is randomly chosen by adding or removing one covariate U m from the outcome model. The decision to move to the candidate model (say, α 1 Y ) or to remain in the current model (say, α 2 Y ) is based on the Metropolis-Hasting ratio of the posterior probability of the outcome models

P ( α 1 Y ∣ O ) P ( α 2 Y ∣ O ) = P ( Y ∣ α 1 Y , X , U ) P ( Y ∣ α 2 Y , X , U ) × ∑ α X P ( α 1 Y ∣ α X ) P ( α X ∣ X , U ) ∑ α X P ( α 2 Y ∣ α X ) P ( α X ∣ X , U ) .

Recalling that P ( α Y ∣ α X ) ∝ ∏ m = 1 M P α Y ∣ α X ( α m Y ∣ α m X ) , where P α Y ∣ α X ( α m Y ∣ α m X ) is given in equation (4)

(A1) P ( α 1 Y ∣ O ) P ( α 2 Y ∣ O ) ∝ P ( Y ∣ α 1 Y , X , U ) P ( Y ∣ α 2 Y , X , U ) × ∑ α X ∏ m = 1 M P α 1 Y ∣ α X ( α m Y ∣ α m X ) P ( α X ∣ X , U ) ∑ α X ∏ m = 1 M P α 2 Y ∣ α X ( α m Y ∣ α m X ) P ( α X ∣ X , U ) .

Equation (A1) can be expensive to compute. Indeed, evaluating the value of each term of the product ∏ m = 1 M P α 1 Y ∣ α X ( α m Y ∣ α m X ) , where α m X = 1 potentially requires fitting a different regression model to estimate δ ˜ m α Y (recall equations (4) and (5)). Similar to [5], denoting by U m ′ , the covariate randomly selected to be added or removed, we propose the following heuristic approximation to (A1):

(A2) ∑ α X ∏ m = 1 M P α 1 Y ∣ α X ( α m Y ∣ α m X ) P ( α X ∣ X , U ) ∑ α X ∏ m = 1 M P α 2 Y ∣ α X ( α m Y ∣ α m X ) P ( α X ∣ X , U ) ≈ ∑ α X P α 1 Y ∣ α X ( α m ′ Y ∣ α m ′ X ) P ( α X ∣ X , U ) ∑ α X P α 2 Y ∣ α X ( α m ′ Y ∣ α m ′ X ) P ( α X ∣ X , U ) .

Under the assumption that the exposure model is correctly specified, remark that the approximation (A2) has the following form asymptotically:

∑ α X ∏ m = 1 M P α 1 Y ∣ α X ( α m Y ∣ α m X ) P ( α X ∣ X , U ) ∑ α X ∏ m = 1 M P α 2 Y ∣ α X ( α m Y ∣ α m X ) P ( α X ∣ X , U ) → ∏ m = 1 M P α 1 Y ∣ α X ∗ ( α m Y ∣ α m X ) ∏ m = 1 M P α 2 Y ∣ α X ∗ ( α m Y ∣ α m X ) ≈ P α 1 Y ∣ α X ∗ ( α m ′ Y ∣ α m ′ X ) P α 2 Y ∣ α X ∗ ( α m ′ Y ∣ α m ′ X ) ,

because the posterior mass of α X concentrates on the exposure model that includes all determinants of the exposure α X ∗ (see argument on this in Section 3.2). The intuition of this approximation is as follows. For a given α X , because the models α 1 Y and α 2 Y only differ in their inclusion of covariate U m ′ , P α 1 Y ∣ α X ( α m Y ∣ α m X ) and P α 2 Y ∣ α X ( α m Y ∣ α m X ) for m ≠ m ′ have the same form (given by equation (4)), while the form of P α 1 Y ∣ α X ( α m ′ Y ∣ α m ′ X ) and P α 2 Y ∣ α X ( α m ′ Y ∣ α m ′ X ) may differ. Notably, when α m X = 0 , P α 1 Y ∣ α X ( α m Y ∣ α m X ) = P α 2 Y ∣ α X ( α m Y ∣ α m X ) = 1 / 2 . When α m X = 1 , P α 1 Y ∣ α X ( α m Y ∣ α m X ) may only differ from P α 2 Y ∣ α X ( α m Y ∣ α m X ) because δ ˜ ˆ m α 1 Y may differ from δ ˜ ˆ m α 2 Y . We expect this difference to generally be small compared to the difference between P α 1 Y ∣ α X ( α m ′ Y ∣ α m X ) and P α 2 Y ∣ α X ( α m ′ Y ∣ α m X ) . While this solution is heuristic in nature, it was observed to perform well in our current simulation study and those reported in ref. [5].

Since P α Y ∣ α X ( α m Y ∣ α m X ) depends on δ ˜ m α Y which is unknown, we proposed replacing δ ˜ m α Y by its maximum likelihood estimate. To account for the uncertainty in this estimate, we can write P α Y ∣ α X ( α m Y ∣ α m X ) = ∫ δ ˜ m α Y P α Y ∣ α X ( α m Y ∣ α m X , δ ˜ m α Y ) f ( δ ˜ m α Y ) d δ ˜ m α Y , where f ( δ ˜ m α Y ) is approximated by its limit distribution, the maximum likelihood estimator distribution N ( δ ˜ ˆ m α Y , S E ^ ( δ ˜ ˆ m α Y ) ) [52,53]. This approximation is expected to have a negligible impact on the results when the sample size is not too small. At the end of this appendix, we present a short simulation study comparing an “oracle” GBCEE, where the true parameters are used in constructing the prior distribution and an “MLE” GBCEE where the parameters are estimated using the maximum likelihood estimator, and the uncertainty associated with this estimation is accounted for in the manner we proposed. These results indicate that the MLE approach has a slightly greater variance than the oracle approach, as expected, but similar bias and coverage of credible intervals. These results support the validity of our proposed approximation.

Step 3. For each α Y explored, compute Δ ˆ α Y , Var ^ ( Δ ˆ α Y ) and P ( α Y ∣ O ) .

The posterior probability of each outcome model P ( α Y ∣ O ) can be approximated using the Metropolis–Hasting ratios of the posterior probability of the outcome models calculated in Step 2. Remark that it is possible to obtain ratios comparing the posterior probability of each explored model to a single alternative model, by multiplying together ratios comparing different models. For example, consider three models α 1 Y , α 2 Y and α 3 Y , and assume that the following Metropolis–Hasting ratios were computed in the algorithm: P ( α 3 Y ∣ O ) P ( α 2 Y ∣ O ) , and P ( α 2 Y ∣ O ) P ( α 1 Y ∣ O ) . If we want to compare the posterior probability of each model to P ( α 1 Y ∣ O ) , this can be done using the previous ratios: P ( α 3 Y ∣ Y ) P ( α 1 Y ∣ Y ) = P ( α 3 Y ∣ Y ) P ( α 2 Y ∣ Y ) × P ( α 2 Y ∣ Y ) P ( α 1 Y ∣ O ) . The posterior probability of each model is then obtained by dividing each of these ratios by their sum. For example, if the only models are α 1 Y , α 2 Y , and α 3 Y , then

P ( α 3 Y ∣ O ) = P ( α 3 Y ∣ O ) / P ( α 1 Y ∣ O ) P ( α 3 Y ∣ O ) / P ( α 1 Y ∣ O ) + P ( α 2 Y ∣ O ) / P ( α 1 Y ∣ O ) + P ( α 1 Y ∣ O ) / P ( α 1 Y ∣ O ) ,

because ∑ α Y ∈ A P ( α Y ∣ O ) = 1 .

B.1 Simulation study comparing oracle GBCEE to MLE GBCEE

Implementing the oracle GBCEE is challenging because it requires calculating the true regression parameters of multiple regression models. As a consequence, we only examined three simple scenarios.

In Scenario 1S, U 1 ∼ N ( 0 , 1 ) , X ∼ Bernoulli ( p = expit ( U 1 ) ) and Y ∼ N ( X + 0.1 U 1 , 1 ) . In Scenario 2S, U 1 and U 2 are independently generated as N ( 0 , 1 ) , X ∼ Bernoulli ( p = expit ( U 1 + U 2 ) ) and Y ∼ N ( X + 0.1 U 1 + 0.1 U 2 , 1 ) . Finally, in Scenario 3S, U 1 ∼ N ( 0 , 1 ) , U 2 ∼ N ( U 1 , 1 ) , X ∼ Bernoulli ( p = expit ( U 1 ) ) , and Y ∼ N ( X + 0.1 U 2 , 1 ) .

For each scenario, we generated 1,000 samples of sizes n = 20 , 50 , 100 , 200 , 1,000 . We estimated the bias, the standard deviation, and the coverage of 95% credible intervals across samples for both oracle and MLE GBCEE. The results are presented in Tables A1, A2, and A3. The results for the oracle GBCEE and the MLE GBCEE are nearly identical, even for the smallest sample size.

Table A1

Results of simulation scenario 1S comparing oracle GBCEE to MLE GBCEE

Method	n = 20	n = 50	n = 100	n = 200	n = 1,000
	Bias
Oracle	0.00	0.03	0.01	0.00	0.00
MLE	0.00	0.03	0.02	0.01	0.00
	Standard deviation
Oracle	0.49	0.32	0.22	0.16	0.07
MLE	0.49	0.32	0.22	0.16	0.07
	Coverage
Oracle	0.95	0.94	0.94	0.94	0.95
MLE	0.95	0.94	0.93	0.94	0.94

Table A2

Results of simulation scenario 2S comparing oracle GBCEE to MLE GBCEE

Method	n = 20	n = 50	n = 100	n = 200	n = 1,000
	Bias
Oracle	0.03	0.04	0.06	0.05	0.03
MLE	0.03	0.05	0.07	0.06	0.03
	Standard deviation
Oracle	0.54	0.33	0.23	0.16	0.08
MLE	0.54	0.33	0.23	0.16	0.08
	Coverage
Oracle	0.94	0.94	0.94	0.94	0.92
MLE	0.93	0.94	0.94	0.92	0.90

Table A3

Results of simulation scenario 3S comparing oracle GBCEE to MLE GBCEE

Method	n = 20	n = 50	n = 100	n = 200	n = 1,000
	Bias
Oracle	− 0.02	0.00	0.00	− 0.01	0.00
MLE	− 0.01	0.01	0.01	0.00	0.00
	Standard deviation
Oracle	0.52	0.31	0.22	0.15	0.07
MLE	0.53	0.32	0.22	0.16	0.07
	Coverage
Oracle	0.94	0.95	0.95	0.95	0.94
MLE	0.94	0.94	0.95	0.94	0.94

C Additional simulation results

Table A4

Results of simulation Scenarios 1–7, with n = 200

Scenario 1						Scenario 2					Scenario 3
				Rel.					Rel.					Rel.
Method	Bias	SD	ESE	RMSE	CP	Bias	SD	ESE	RMSE	CP	Bias	SD	ESE	RMSE	CP
Full-g	− 0.00	0.24	0.24	1.49	0.95	− 0.01	0.21	0.21	1.10	0.95	0.00	0.54	0.53	1.52	0.95
Target-g	0.00	0.16	0.16	1.00	0.95	− 0.01	0.19	0.19	1.00	0.95	− 0.01	0.35	0.35	1.00	0.94
Full-DR	1.13	31.36	4.60	198.12	0.82	− 0.02	2.27	0.55	12.01	0.88	.	.	.	.	.
Target-DR	0.00	0.16	0.15	1.00	0.93	− 0.01	0.19	0.14	1.00	0.85	− 0.01	0.35	0.29	1.00	0.88
GBCEE	0.02	0.20	0.19	1.29	0.93	− 0.01	0.26	0.24	1.39	0.91	− 0.13	0.48	0.34	1.41	0.79
OAL	0.05	0.20	.	1.29	.	0.43	0.43	.	3.20	.	− 0.25	0.60	.	1.84	.
OAL-TMLE	0.00	0.19	.	1.22	.	0.01	0.30	.	1.56	.	− 0.30	0.46	.	1.55	.
C-TMLE	0.06	0.28	0.13	1.82	0.74	0.05	0.28	0.14	1.48	0.70	0.02	1.08	0.26	3.05	0.56
GLiDeR	0.03	0.18	.	1.13	.	0.01	0.24	.	1.25	.	− 0.16	0.38	.	1.18	.
BAC	0.01	0.22	0.21	1.38	0.94	− 0.04	0.20	0.19	1.09	0.94	− 0.04	0.47	0.45	1.32	0.94
MADR	0.03	0.17	.	1.12	.	− 0.01	0.40	.	2.13	.	− 0.14	0.48	.	1.41	.
BP	0.02	0.22	0.20	1.42	0.92	− 0.01	0.21	0.20	1.10	0.94	0.01	0.45	0.37	1.28	0.93
HDM	− 0.00	0.18	0.18	1.14	0.95	− 0.00	0.20	0.20	1.07	0.95	− 0.01	0.36	0.36	1.01	0.95
HD-SSL	0.01	0.17	0.18	1.07	0.96	0.01	0.22	0.20	1.14	0.93	− 0.04	0.39	0.39	1.10	0.94

Scenario 4						Scenario 5					Scenario 6
Full-g	− 4.78	3.82	4.07	1.15	0.83	− 0.01	0.22	0.22	1.03	0.95	− 0.01	0.16	0.16	1.03	0.95
Target-g	− 4.78	3.82	4.07	1.15	0.83	− 0.00	0.21	0.22	1.00	0.95	− 0.01	0.16	0.16	1.00	0.95
Full-DR	0.23	5.30	4.53	1.00	0.94	− 0.06	1.53	0.39	7.14	0.86	0.00	0.18	0.16	1.11	0.92
Target-DR	0.23	5.30	4.53	1.00	0.94	− 0.00	0.21	0.12	1.00	0.73	− 0.01	0.16	0.14	1.00	0.91
GBCEE	0.47	5.36	4.47	1.01	0.92	0.01	0.26	0.22	1.23	0.85	0.03	0.16	0.15	1.04	0.92
OAL	1.09	12.00	.	2.27	0.84	0.25	0.39	.	2.17	0.92	0.06	0.16	.	1.10	.
OAL-TMLE	0.22	5.29	.	1.00	0.87	0.02	0.27	.	1.24	0.97	0.05	0.16	.	1.08	.
C-TMLE	2.50	2.79	2.49	0.71	0.78	0.05	0.40	0.16	1.87	0.66	0.03	0.16	0.13	1.02	0.89
GLiDeR	0.21	5.43	.	1.02	0.90	0.04	0.23	.	1.10	0.96	0.05	0.17	.	1.08	.
BAC	− 4.61	3.82	4.09	1.13	0.84	0.01	0.22	0.22	1.02	0.95	− 0.02	0.16	0.13	0.99	0.87
MADR	0.28	5.13	.	0.97	0.90	0.01	0.65	.	3.04	0.97	0.05	0.16	.	1.06	.
BP	− 4.78	3.81	4.07	1.15	0.83	− 0.01	0.22	0.22	1.03	0.95	− 0.00	0.16	0.15	1.02	0.94
HDM	− 4.77	3.82	3.74	1.15	0.76	− 0.01	0.22	0.22	1.03	0.95	− 0.06	0.16	0.15	1.04	0.92
HD-SSL	− 4.36	3.79	4.04	1.09	0.84	0.03	0.22	0.22	1.02	0.94	0.03	0.15	0.15	0.98	0.94

Scenario 7
Full-g	− 0.01	0.19	.	1.02	.
Target-g	− 0.01	0.18	.	1.00	.
Full-DR	− 0.01	0.21	0.20	1.14	0.94
Target-DR	− 0.01	0.19	0.19	1.05	0.94
GBCEE	− 0.01	0.22	0.22	1.20	0.95
OAL	0.08	0.25	.	1.46	.
OAL-TMLE	− 0.02	0.22	.	1.21	.
C-TMLE	− 0.02	0.22	0.18	1.23	0.89
GLiDeR	− 0.03	0.22	.	1.20	.
BAC	0.01	0.19	0.19	1.07	0.96
MADR	− 0.04	0.22	.	1.22	.
BP	− 0.26	0.20	0.19	1.82	0.69
HDM	− 0.26	0.20	0.20	1.83	0.73
HD-SSL	− 0.26	0.20	0.19	1.81	0.71

Table A5

Results of simulation Scenarios 1B, 2B, 4B, and 5B, with n = 1,000

	Scenario 1B					Scenario 2B
				Rel.					Rel.
Method	Bias	SD	ESE	RMSE	CP	Bias	SD	ESE	RMSE	CP
Full-g	0.00	0.04	.	1.46	.	0.00	0.04	.	1.11	.
Target-g	0.00	0.03	.	1.00	.	0.00	0.04	.	1.00	.
Full-DR	0.00	0.58	0.18	19.36	0.93	− 0.00	0.09	0.07	2.67	0.93
Target-DR	0.00	0.03	0.03	1.01	0.95	0.00	0.05	0.04	1.33	0.93
GBCEE	0.00	0.04	0.04	1.27	0.94	0.00	0.05	0.06	1.54	0.93
OAL	0.01	0.04	.	1.49	.	0.04	0.07	.	2.18	.
OAL-TMLE	0.00	0.05	.	1.52	.	0.00	0.06	.	1.66	.
C-TMLE	0.01	0.05	0.03	1.57	0.80	0.04	0.08	0.08	2.39	0.53
BAC	0.00	0.04	0.04	1.45	0.91	− 0.01	0.04	0.04	1.10	0.91

Scenario 4B						Scenario 5B
Full-g	0.00	0.02	.	1.02	.	0.00	0.04	.	1.04	.
Target-g	0.00	0.02	.	1.02	.	0.00	0.04	.	1.00	.
Full-DR	− 0.00	0.02	0.02	1.00	0.94	0.00	0.05	0.06	1.43	0.96
Target-DR	− 0.00	0.02	0.02	1.00	0.94	0.00	0.04	0.04	1.14	0.93
GBCEE	− 0.00	0.02	0.02	1.00	0.94	0.00	0.04	0.05	1.10	0.96
OAL	0.00	0.02	.	1.06	.	0.01	0.08	.	2.13	.
OAL-TMLE	0.00	0.02	.	1.02	.	0.00	0.04	.	1.14	.
C-TMLE	0.00	0.02	0.02	1.00	0.96	0.00	0.05	0.04	1.04	0.94
BAC	0.00	0.02	0.02	1.02	0.95	0.00	0.04	0.04	1.04	0.94

Conditional instruments are present in Scenario 1B, covariates are strongly correlated in Scenario 2B, the true outcome model includes nonlinear terms in Scenario 4B, and the true exposure model includes nonlinear terms in Scenario 5B.

Table A6

Results of Scenario 2C (continuous exposure, continuous outcome) with n = 200

				Rel.
Method	Bias	SD	ESE	RMSE	CP
Full-g	− 0.00	0.08	0.07	1.63	0.95
Target-g	− 0.00	0.05	0.05	1.00	0.96
Full-DR	− 0.00	0.08	0.07	1.63	0.91
Target-DR	− 0.00	0.05	0.05	1.00	0.96
GBCEE	− 0.00	0.05	0.05	1.18	0.94
BAC	− 0.02	0.07	0.07	1.62	0.94
BP	− 0.00	0.07	0.07	1.62	0.93
HD-SSL	0.02	0.07	0.07	1.49	0.94

Table A7

Results of Scenario 2C (continuous exposure, continuous outcome) with n = 1,000

				Rel.
Method	Bias	SD	ESE	RMSE	CP
Full-g	− 0.00	0.03	0.03	1.61	0.95
Target-g	0.00	0.02	0.02	1.00	0.95
Full-DR	− 0.00	0.03	0.03	1.61	0.95
Target-DR	0.00	0.02	0.02	1.00	0.95
GBCEE	− 0.00	0.02	0.02	1.09	0.94
BAC	− 0.00	0.03	0.03	1.61	0.94
BP	− 0.00	0.03	0.03	1.61	0.95
HD-SSL	0.00	0.03	0.03	1.28	0.96

Table A8

Results of Scenario 2D (continuous exposure, binary outcome) with n = 1,000

				Rel.
Method	Bias	SD	ESE	RMSE	CP
Full-g	0.01	0.19	.	1.23	.
Target-g	0.01	0.16	.	1.00	.
Full-DR	0.03	0.23	0.46	1.48	0.86
Target-DR	0.02	0.19	0.34	1.24	0.92
GBCEE	0.03	0.22	0.33	1.43	0.81

Table A9

Results of Scenario 2E (misspecification of the outcome error distribution) with n = 200

				Rel.
Method	Bias	SD	ESE	RMSE	CP
Full-g	0.01	0.20	0.21	1.11	0.96
Target-g	0.00	0.18	0.19	1.00	0.95
Full-DR	0.14	3.97	0.53	21.75	0.87
Target-DR	0.00	0.18	0.14	1.00	0.87
GBCEE	− 0.00	0.27	0.22	1.45	0.91
OAL	0.44	0.44	.	1.45	.
OAL-TMLE	0.00	0.30	.	1.44	.
C-TMLE	0.07	0.25	0.13	1.44	0.72
GLiDeR	0.01	0.22	.	1.19	.
BAC	− 0.03	0.19	0.19	1.07	0.95
MADR	− 0.00	0.29	.	1.56	.
BP	0.00	0.20	0.20	1.10	0.96
HDM	0.01	0.20	0.20	1.07	0.96
HD-SSL	0.01	0.20	0.20	1.09	0.94

Table A10

Results of Scenario 2E (misspecification of the outcome error distribution) with n = 1,000

				Rel.
Method	Bias	SD	ESE	RMSE	CP
Full-g	− 0.00	0.09	0.09	1.09	0.95
Target-g	− 0.00	0.08	0.08	1.00	0.94
Full-DR	− 0.01	0.29	0.18	3.48	0.94
Target-DR	− 0.00	0.08	0.06	1.00	0.86
GBCEE	− 0.00	0.13	0.12	1.56	0.95
OAL	0.27	0.27	.	4.65	.
OAL-TMLE	0.00	0.14	.	1.66	.
C-TMLE	0.02	0.11	0.07	1.34	0.79
GLiDeR	0.00	0.11	.	1.33	.
BAC	− 0.01	0.09	0.09	1.09	0.95
MADR	− 0.00	0.14	.	1.71	.
BP	− 0.00	0.09	0.09	1.09	0.95
HDM	− 0.00	0.09	0.09	1.09	0.94
HD-SSL	0.00	0.08	0.09	1.02	0.95

Table A11

Results of simulation Scenarios 1–7, with n = 200 for different implementations of GBCEE

	Scenario 1					Scenario 2					Scenario 3
				Rel.					Rel.					Rel.
Method	Bias	SD	ESE	RMSE	CP	Bias	SD	ESE	RMSE	CP	Bias	SD	ESE	RMSE	CP
ω = 100 n 0.5	0.01	0.19	0.17	1.22	0.92	− 0.00	0.28	0.23	1.48	0.89	− 0.12	0.50	0.34	1.45	0.78
ω = 500 n 0.5	0.02	0.20	0.19	1.29	0.93	− 0.01	0.29	0.24	1.53	0.91	− 0.13	0.48	0.34	1.41	0.79
ω = 1,000 n 0.5	0.01	0.22	0.20	1.39	0.93	− 0.00	0.29	0.24	1.53	0.90	− 0.11	0.50	0.35	1.45	0.80
w/o model averaging	0.01	0.22	0.17	1.39	0.86	− 0.00	0.29	0.22	1.53	0.86	− 0.11	0.50	0.34	1.45	0.77

	Scenario 4					Scenario 5					Scenario 6
ω = 100 n 0.5	0.30	5.69	4.47	1.07	0.91	0.02	0.25	0.20	1.18	0.85	0.04	0.16	0.15	1.05	0.92
ω = 500 n 0.5	0.47	5.36	4.47	1.01	0.92	0.01	0.26	0.22	1.23	0.85	0.03	0.16	0.15	1.04	0.92
ω = 1,000 n 0.5	0.29	5.69	4.47	1.08	0.91	0.02	0.26	0.22	1.23	0.86	0.03	0.16	0.15	1.07	0.92
w/o model averaging	0.28	5.70	4.44	1.08	0.90	0.02	0.26	0.18	1.23	0.80	0.03	0.17	0.15	1.07	0.90

Scenario 7
ω = 100 n 0.5	− 0.01	0.22	0.22	1.19	0.95
ω = 500 n 0.5	− 0.01	0.22	0.22	1.20	0.95
ω = 1,000 n 0.5	− 0.01	0.22	0.22	1.21	0.95
w/o model averaging	− 0.01	0.22	0.21	1.21	0.94

Conditional instruments are present in Scenario 1, covariates are strongly correlated in Scenario 2, a large number of covariates are present in Scenario 3, the true outcome model includes nonlinear terms in Scenario 4, true exposure model includes nonlinear terms in Scenario 5, and an exposure–covariate interaction is present in Scenario 7.

Table A12

Results of simulation Scenarios 1–7, with n = 1,000 for different implementations of GBCEE

	Scenario 1					Scenario 2					Scenario 3
				Rel.					Rel.					Rel.
Method	Bias	SD	ESE	RMSE	CP	Bias	SD	ESE	RMSE	CP	Bias	SD	ESE	RMSE	CP
ω = 100 n 0.5	0.01	0.08	0.07	1.17	0.94	− 0.00	0.13	0.12	1.56	0.93	− 0.02	0.20	0.19	1.29	0.92
ω = 500 n 0.5	0.00	0.08	0.07	1.17	0.95	0.00	0.14	0.13	1.65	0.94	0.00	0.20	0.19	1.29	0.94
ω = 1,000 n 0.5	0.01	0.08	0.08	1.26	0.94	− 0.00	0.13	0.13	1.58	0.94	− 0.02	0.21	0.19	1.29	0.93
w/o model averaging	0.01	0.08	0.07	1.26	0.92	− 0.00	0.13	0.11	1.58	0.93	− 0.02	0.20	0.19	1.29	0.92

	Scenario 4					Scenario 5					Scenario 6
ω = 100 n 0.5	0.06	2.47	2.32	0.99	0.92	0.01	0.11	0.09	1.21	0.86	0.03	0.07	0.07	1.10	0.93
ω = 500 n 0.5	0.07	2.51	2.33	1.00	0.91	0.00	0.11	0.09	1.20	0.88	0.02	0.07	0.07	1.08	0.93
ω = 1,000 n 0.5	0.06	2.47	2.32	0.99	0.92	0.00	0.11	0.09	1.24	0.89	0.02	0.07	0.07	1.09	0.93
w/o model averaging	0.06	2.47	2.32	0.99	0.92	0.00	0.12	0.08	1.24	0.83	0.02	0.07	0.07	1.09	0.92

Scenario 7
ω = 100 n 0.5	− 0.01	0.09	0.10	1.07	0.96
ω = 500 n 0.5	− 0.01	0.09	0.10	1.07	0.96
ω = 1,000 n 0.5	− 0.01	0.09	0.10	1.07	0.96
w/o model averaging	− 0.01	0.09	0.10	1.07	0.96

Conditional instruments are present in Scenario 1, covariates are strongly correlated in Scenario 2, a large number of covariates are present in Scenario 3, the true outcome model includes nonlinear terms in Scenario 4, true exposure model includes nonlinear terms in Scenario 5, and an exposure–covariate interaction is present in Scenario 7.

D Covariates inclusion probabilities

In the following images, each line represents the marginal probability of inclusion of covariates for a given method. The gray line represents the ideal situation where confounders and outcome risk factors are included, whereas instruments and spurious variables are excluded. MADRx and MADRy represent the covariates included to model the exposure and the outcome, respectively. Unlike GBCEE, MADR does not necessarily include the same covariates in both models.

Note: Lines have been slightly adjusted vertically to avoid superimposition, but GBCEE, GLiDeR, BAC, MADRy, and BP all have probability of inclusions of 1 for all variables.

E Additional information on the application

Table A13

Descriptive characteristics of the participants in the study of osteoporotic fractures according to exposure to physical activity recommandations

	Unexposed	Exposed	Standardized mean difference
n	2,984	474
Body mass index (in kg/m 2 )	26.45 (4.37)	25.15 (4.04)	0.309
Waist-to-hip ratio	0.81 (0.06)	0.80 (0.06)	0.183
Short Mini Mental Status Exam	24.68 (1.61)	24.95 (1.33)	0.184
Clinic			0.267
1	751 (25.2)	153 (32.3)
2	595 (19.9)	122 (25.7)
3	780 (26.1)	103 (21.7)
4	858 (28.8)	96 (20.3)
Age	71.67 (5.18)	70.07 (4.31)	0.335
Highest grade of school completed	12.56 (2.70)	13.62 (2.76)	0.388
Northern European, n (%)	1,845 (61.8)	298 (62.9)	0.021
Central European, n (%)	1,512 (50.7)	235 (49.6)	0.022
Southern European, n (%)	198 (6.6)	29 (6.1)	0.021
Jewish, n (%)	56 (1.9)	17 (3.6)	0.105
Native American, n (%)	21 (0.7)	1 (0.2)	0.073
Russian, n (%)	83 (2.8)	11 (2.3)	0.029
Other origin, n (%)	121 (4.1)	16 (3.4)	0.036
Hysterectomy, n (%)	759 (25.4)	117 (24.7)	0.017
Ovary removed, n (%)	756 (25.3)	123 (25.9)	0.014
Fracture before 50, n (%)	997 (33.4)	145 (30.6)	0.060
Fell in past year, n (%)	798 (26.7)	128 (27.0)	0.006
Injury from fall in past year, n (%)	562 (18.8)	80 (16.9)	0.051
Fear of falling, n (%)	1,323 (44.3)	131 (27.6)	0.353
Time/year of moderate physical activity at 50	16.58 (45.32)	50.24 (77.91)	0.528
Time/year of high physical activity at 50	1.93 (16.05)	7.89 (29.92)	0.248
Time/year of moderate physical activity at 30	16.81 (46.74)	38.70 (74.39)	0.352
Time/year of high physical activity at 30	3.28 (19.89)	7.86 (30.47)	0.178
Time/year of moderate physical activity when teenager	46.61 (80.66)	69.87 (94.15)	0.265
Time/year of high physical activity when teenager	16.20 (48.09)	20.68 (51.64)	0.090
Bed ridden for 7 days in past year, n (%)	126 (4.2)	11 (2.3)	0.107
Smoking, n (%)			0.080
Never	1,887 (63.2)	291 (61.4)
Past	803 (26.9)	143 (30.2)
Current	294 (9.9)	40 (8.4)
Caffeine intake (mg/day)	134.06 (137.32)	138.69 (136.24)	0.034
Any alcoholic beverage in past year, n (%)	2,046 (68.6)	376 (79.3)	0.247
At least 1 alcoholic beverage in past 30 days, n (%)	1,576 (52.8)	314 (66.2)	0.276
How often > 3 drinks/day in past 30 days	0.18 (0.76)	0.22 (0.73)	0.050
Osteoporosis, n (%)	426 (14.3)	39 (8.2)	0.192
Diabetes, n (%)	174 (5.8)	19 (4.0)	0.084
Ever had a stroke, n (%)	84 (2.8)	4 (0.8)	0.147
Ever had hypertension, n (%)	1,114 (37.3)	142 (30.0)	0.157
Parkinson’s disease, n (%)	12 (0.4)	1 (0.2)	0.035
Arthritis, n (%)	1,772 (59.4)	253 (53.4)	0.121
Stayed in hospital overnight in past year, n (%)	311 (10.4)	39 (8.2)	0.076
Pain around hip for most days in a month in past year, n (%)	1,015 (34.0)	146 (30.8)	0.069
Thiazide use, n (%)			0.116
Never	1,971 (66.1)	335 (70.7)
Past	244 (8.2)	40 (8.4)
Current	769 (25.8)	99 (20.9)
Non-thiazide diuretice use, n (%)			0.167
Never	2,750 (92.2)	455 (96.0)
Past	79 (2.6)	8 (1.7)
Current	155 (5.2)	11 (2.3)
Benzodiazapine use in past year, n (%)	414 (13.9)	63 (13.3)	0.017
Sedative hypnotic use in past year, n (%)	37 (1.2)	13 (2.7)	0.108
Antidepressants use in past year, n (%)	101 (3.4)	19 (4.0)	0.033
Oral estrogen use, n (%)			0.286
Never	1,913 (64.1)	238 (50.2)
Past	743 (24.9)	158 (33.3)
Current	328 (11.0)	78 (16.5)
Progestin use, n (%)			0.171
Never	2,819 (94.5)	429 (90.5)
Past	76 (2.5)	14 (3.0)
Current	89 (3.0)	31 (6.5)
Difficulty walking 2 or 3 blocks, n (%)	356 (11.9)	18 (3.8)	0.306
Back pain in past year, n (%)	1,922 (64.4)	274 (57.8)	0.136
Use walking aids, n (%)	114 (3.8)	11 (2.3)	0.087
Problems that prevent getting up or walking up stairs – n (%)	283 (9.5)	27 (5.7)	0.143
Comparative self-ratted health (1 = Excellent, 5 = Very poor)	1.84 (0.70)	1.53 (0.63)	0.461
Not married, n (%)	1,500 (50.3)	182 (38.4)	0.241
Years since menopause	23.76 (8.03)	21.04 (7.29)	0.354
Mother ever had a fracture, n (%)	974 (32.6)	158 (33.3)	0.015
Father ever had a fracture, n (%)	626 (21.0)	106 (22.4)	0.034

Numbers are mean (standard deviation) unless otherwise stated.

References

[1] Walter S, Tiemeier H. Variable selection: current practice in epidemiological studies. European J Epidemiol. 2009;24(12):733–6. 10.1007/s10654-009-9411-2Search in Google Scholar PubMed PubMed Central

[2] Talbot D, Massamba VK. A descriptive review of variable selection methods in four epidemiologic journals: there is still room for improvement. European J Epidemiol. 2019;34(8):725–30. 10.1007/s10654-019-00529-ySearch in Google Scholar PubMed

[3] Ertefaie A, Asgharian M, Stephens DA. Variable selection in causal inference using a simultaneous penalization method. J Causal Inference. 2018;6(1):1–16. 10.1515/jci-2017-0010Search in Google Scholar

[4] Shortreed SM, Ertefaie A. Outcome-adaptive lasso: variable selection for causal inference. Biometrics. 2017;73(4):1111–22. 10.1111/biom.12679Search in Google Scholar PubMed PubMed Central

[5] Talbot D, Lefebvre G, Atherton J. The Bayesian causal effect estimation algorithm. J Causal Inference. 2015;30(2):207–36. 10.1515/jci-2014-0035Search in Google Scholar

[6] Wang C, Parmigiani G, Dominici F. Bayesian effect estimation accounting for adjustment uncertainty. Biometrics. 2012;68(3):661–71. 10.1111/j.1541-0420.2011.01731.xSearch in Google Scholar PubMed

[7] Fithian W, Sun D, Taylor J. Optimal inference after model selection. 2014. Available from: https://arxiv.org/abs/1410.2597. Search in Google Scholar

[8] Berk R, Brown L, Buja A, Zhang K, Zhao L. Valid post-selection inference. Ann Statist. 2013;41(2):802–37. 10.1214/12-AOS1077Search in Google Scholar

[9] Leeb H, Pötscher BM. Model selection and inference: facts and fiction. Econometric Theory. 2005;21(1):21–59. 10.1017/S0266466605050036Search in Google Scholar

[10] Leeb H, Pötscher BM, Can one estimate the unconditional distribution of post-model-selection estimators?. Econometric Theory. 2008;24(2):338–76. 10.1017/S0266466608080158Search in Google Scholar

[11] Panigrahi S, Taylor J, Asaf W. Bayesian post-selection inference in the linear model. 2016. Available from: https://arxiv.org/abs/1605.08824. Search in Google Scholar

[12] Crainiceanu CM, Dominici F, Parmigiani G. Adjustment uncertainty in effect estimation. Biometrika. 2008;73(2):635–51. 10.1093/biomet/asn015Search in Google Scholar

[13] Belloni A, Chernozhukov V, Hansen C. Inference on treatment effects after selection among high-dimensional controls. Rev Econ Stud. 2014;81(2):608–50. 10.1093/restud/rdt044Search in Google Scholar

[14] van der Laan MJ, Gruber S. Collaborative double robust targeted maximum likelihood estimation. Int J Biostatist. 2010;6(17):1–61. 10.2202/1557-4679.1181Search in Google Scholar PubMed PubMed Central

[15] Wang C, Dominici F, Parmigiani G, Zigler CM. Accounting for uncertainty in confounder and effect modifier selection when estimating average causal effects in generalized linear models. Biometrics. 2015;71(3):654–65. 10.1111/biom.12315Search in Google Scholar PubMed PubMed Central

[16] Wilson A, Reich BJ. Confounder selection via penalized credible regions. Biometrics. 2014;70(4):852–61. 10.1111/biom.12203Search in Google Scholar PubMed

[17] Cefalu M, Dominici F, Arvold N, Parmigiani G. Model averaged double robust estimation. Biometrics. 2017;73(2):410–21. 10.1111/biom.12622Search in Google Scholar PubMed PubMed Central

[18] Koch B, Vock DM, Wolfson J. Covariate selection with group lasso and double robust estimation of causal effects. Biometrics. 2018;74(1):8–17. 10.1111/biom.12736Search in Google Scholar PubMed PubMed Central

[19] Ju C, Benkeser D, van der Laan MJ. Robust inference on the average treatment effect using the outcome highly adaptive lasso. Biometrics. 2019;76(1):109–18. 10.1111/biom.13121Search in Google Scholar PubMed

[20] Antonelli J, Parmigiani G, Dominici F. High-dimensional confounding adjustment using continuous spike and slab priors. Bayesian Anal. 2019;14(3):805–28. 10.1214/18-BA1131Search in Google Scholar PubMed PubMed Central

[21] Pearl J. Causality: Models, reasoning, and inference. 2nd ed. New York: Cambridge University Press; 2009. 10.1017/CBO9780511803161Search in Google Scholar

[22] Brookhart MA, Schneeweiss S, Rothman KJ, Glynn RJ, Avorn J, Stürmer T. Variable selection for propensity score models. Am J Epidemiol. 2006;163(12):1149–56. 10.1093/aje/kwj149Search in Google Scholar PubMed PubMed Central

[23] de Luna X, Waernbaum I, Richardson TS. Covariate selection for the nonparametric estimation of an average treatment effect. Biometrika. 2011;98(4):861–75. 10.1093/biomet/asr041Search in Google Scholar

[24] Lefebvre G, Atherton J, Talbot D. The effect of the prior distribution in the Bayesian adjustment for confounding algorithm. Comput Statist Data Anal. 2014;70:227–40. 10.1016/j.csda.2013.09.011Search in Google Scholar

[25] Rotnizky A, Smucler E. Efficient adjustment sets for population average causal treatment effect estimation in graphical models. J Mach Learn Res. 2020;21(188):1–86. Search in Google Scholar

[26] Tang D, Kong D, Pan W, Wang L. Ultra-high dimensional variable selection for doubly robust causal inference. 2020. Available from: https://arxiv.org/abs/2007.14190. Search in Google Scholar

[27] Saarela O, Belzile LR, Stephens DA. A Bayesian view of doubly robust causal inference. Biometrika. 2016;103(3):667–81. 10.1093/biomet/asw025Search in Google Scholar

[28] Antonelli J, Papadogeorgou G, Dominici F. Causal inference in high dimensions: a marriage between Bayesian modeling and good frequentist properties. Biometrics. 2022;78(1):100–14. 10.1111/biom.13417Search in Google Scholar PubMed PubMed Central

[29] Pearl J. Invited commentary: understanding bias amplification. Am J Epidemiol. 2011;174(11):1223–7. 10.1093/aje/kwr352Search in Google Scholar PubMed PubMed Central

[30] Raftery A. Bayesian model selection in structural equation models. In: Bollen K, Long J, editor, Testing structural equation models. Newbury Park, CA: Sage; 1993. p. 163–80. Search in Google Scholar

[31] Raftery AE, Hoeting J, Volinsky C, Painter I, Yeung KY. BMA: Bayesian model averaging. 2018. R package version 3.18.9. Available from: https://CRAN.R-project.org/package=BMA. Search in Google Scholar

[32] Haughton DMA. On the choice of a model to fit data from an exponential family. Ann Statist. 1988;14(1):342–65. 10.1214/aos/1176350709Search in Google Scholar

[33] Wasserman L. Bayesian model selection and model averaging. J Math Psych. 2000;44(1):92–10. 10.1006/jmps.1999.1278Search in Google Scholar PubMed

[34] van der Laan MJ, Rubin D. Targeted maximum likelihood learning. Int J Biostatist. 2006;2(1):1–38.10.2202/1557-4679.1043Search in Google Scholar

[35] Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for non-ignorable drop-out using semi-parametric nonresponse models (with discussion and rejoinder). J Amer Statist Assoc. 1999;94(448):1096–120. 10.1080/01621459.1999.10473862Search in Google Scholar

[36] Porter KE, Gruber S, van der Laan MJ, Sekhon JS. The relative performance of targeted maximum likelihood estimators. Int J Biostatist. 2011;7(1):1–34. 10.2202/1557-4679.1308Search in Google Scholar PubMed PubMed Central

[37] van der Laan MJ, Rose S. Targeted learning: causal inference for observational and experimental data. New York: Springer Series in Statistics; 2011. 10.1007/978-1-4419-9782-1Search in Google Scholar

[38] Tsiatis A. Semiparametric theory and missing data. New York: Springer Science & Business Media; 2007. Search in Google Scholar

[39] Luque-Fernandez MA, Schomaker M, Rachet B, Schnitzer ME. Targeted maximum likelihood estimation for a binary treatment: a tutorial. Stat Med. 2018;37(16):2530–46. 10.1002/sim.7628Search in Google Scholar PubMed PubMed Central

[40] Siddique AA, Schnitzer ME, Bahamyirou A, Wang G, Holtz TH, Migliori GB, et al. Causal inference with multiple concurrent medications: a comparison of methods and an application in multidrug-resistant tuberculosis. Stat Methods Med Res. 2019;28(12):3534–49. 10.1177/0962280218808817Search in Google Scholar PubMed PubMed Central

[41] Rosenblum M, van der Laan MJ. Targeted maximum likelihood estimation of the parameter of a marginal structural model. Biostatistics. 2010;6(2):1–27. 10.2202/1557-4679.1238Search in Google Scholar PubMed PubMed Central

[42] Neugebauer R, van der Laan MJ. Why prefer double robust estimators in causal inference? J Statist Plann Inference. 2005;129:405–26. 10.1016/j.jspi.2004.06.060Search in Google Scholar

[43] Madigan D, York J, Allard D. Bayesian graphical models for discrete data. Int Stat Rev. 1995;63(2):215–32. 10.2307/1403615Search in Google Scholar

[44] Looker AC, Frenk SM. Percentage of adults aged 65 and over with osteoporosis or low bone mass at the femur neck or lumbar spine: United States, 2005-2010. National Center for Health Statistics. 2020. Available from: https://www.cdc.gov/nchs/data/hestat/osteoporsis/osteoporosis2005_2010.htm. Search in Google Scholar

[45] Bonaiuti D, Shea B, Iovine R, Negrini S, Welch V, Kemper HH, et al. Exercise for preventing and treating osteoporosis in postmenopausal women. Cochrane Database of Systematic Reviews. 2002;(2):1–37. 10.1002/14651858.CD000333Search in Google Scholar PubMed

[46] Thibaud M, Bloch F, Tournoux-Facon C, Brèque C, Rigaud S, Dugué B, et al. Impact of physical activity and sedentary behaviour on fall risks in older people: a systematic review and meta-analysis of observational studies. European Rev Aging Phys Activity. 2011;9:5–15. 10.1007/s11556-011-0081-1Search in Google Scholar

[47] World Health Organization. Global recommandations on physical activity for health. 2010. Available from: https://www.who.int/dietphysicalactivity/publications/9789241599979/en/. Search in Google Scholar

[48] Benasseur I, Talbot D, Durand M, Holbrook A, Matteau A, Potter BJ, et al. A comparison of confounder selection and adjustment methods for estimating causal effects using large healthcare databases. Pharmacoepidemiol Drug Safety. 2022;31(4):424–33. 10.1002/pds.5403Search in Google Scholar PubMed PubMed Central

[49] Ramamoorthi RV, Siriam K, Martin R. On the posterior concentration in misspecified models. Bayesian Anal. 2015;10(4):759–89. 10.1214/15-BA941Search in Google Scholar

[50] Lv J, Liu JS. Model selection principles in misspecified models. J R Stat Soc Ser B Stat Methodol. 2014;76(1):141–67. 10.1111/rssb.12023Search in Google Scholar

[51] Efron B. Estimation and accuracy after model selection. J Amer Statist Assoc. 2013;109(507):991–1007. 10.1080/01621459.2013.823775Search in Google Scholar PubMed PubMed Central

[52] Walker AM. On the asymptotic behavior of posterior distributions. J R Stat Soc Ser B Stat Methodol. 1969;31(1):80–8. Search in Google Scholar

[53] Dawid AP. On the limiting normality of posterior distributions. Math Proc Cambridge Philos Soc. 1970;67(6):25–33. 10.1017/S0305004100045953Search in Google Scholar

Received: 2021-04-29

Revised: 2022-09-01

Accepted: 2022-10-03

Published Online: 2022-11-17

This work is licensed under the Creative Commons Attribution 4.0 International License.

A generalized double robust Bayesian model averaging approach to causal effect estimation with application to the study of osteoporotic fractures

Abstract

1 Introduction

2 Notation and motivation

3 The GBCEE algorithm

3.1 Overview

3.2 Prior distribution construction

3.3 DR estimators

3.3.1 Continuous or binary outcome, binary exposure

3.3.2 Continuous outcome, continuous exposure

3.3.3 Binary outcome, continuous exposure

3.4 Implementation

4 Simulation study

4.1 Scenarios

4.2 Analysis

4.3 Results

5 Application for estimating the effect of physical activity on the risk of fractures

6 Discussion

Appendix A Asymptotic properties of the maximum likelihood estimation of the prior’s parameters

B Detailed presentation of GBCEE’s implementation

B.1 Simulation study comparing oracle GBCEE to MLE GBCEE

C Additional simulation results

D Covariates inclusion probabilities

E Additional information on the application

References

Journal and Issue

Articles in the same Issue