Skip to content
BY 4.0 license Open Access Published by De Gruyter February 15, 2023

Matched design for marginal causal effect on restricted mean survival time in observational studies

  • Zihan Lin , Ai Ni and Bo Lu EMAIL logo

Abstract

Investigating the causal relationship between exposure and time-to-event outcome is an important topic in biomedical research. Previous literature has discussed the potential issues of using hazard ratio (HR) as the marginal causal effect measure due to noncollapsibility. In this article, we advocate using restricted mean survival time (RMST) difference as a marginal causal effect measure, which is collapsible and has a simple interpretation as the difference of area under survival curves over a certain time horizon. To address both measured and unmeasured confounding, a matched design with sensitivity analysis is proposed. Matching is used to pair similar treated and untreated subjects together, which is generally more robust than outcome modeling due to potential misspecifications. Our propensity score matched RMST difference estimator is shown to be asymptotically unbiased, and the corresponding variance estimator is calculated by accounting for the correlation due to matching. Simulation studies also demonstrate that our method has adequate empirical performance and outperforms several competing methods used in practice. To assess the impact of unmeasured confounding, we develop a sensitivity analysis strategy by adapting the E-value approach to matched data. We apply the proposed method to the Atherosclerosis Risk in Communities Study (ARIC) to examine the causal effect of smoking on stroke-free survival.

MSC 2010: 62N02

1 Introduction

1.1 Causal inference for observational survival data

In biomedical studies, time-to-event is a commonly used outcome measure, and the statistical analyses for such data are usually referred to as survival analysis. Investigating the causal relationship between exposure and the time-to-event outcome is an important topic, with either randomized trials or observational studies. Causal inference for observational survival data has several challenges. First, since not all subjects can be observed for the full duration of time to event, the survival data suffer from censoring, which is a type of missing data problem. Therefore, standard statistical methods are usually not sufficient to handle both censoring and the missingness of potential outcomes. Second, the hazard ratio is a popular choice for measuring the association of survival outcomes between two groups, for convenience and easy interpretation. However, the hazard ratio is generally not an appropriate marginal causal effect measure due to its noncollapsibility property [13]. Other effect measures need to be considered to warrant valid marginal causal interpretation for survival data. Third, confounding is a major challenge in observational studies, which includes both measured and unmeasured confounders. Propensity score adjustments are popular tools for controlling the observed confounding [4]. But even with successful adjustment of observed confounding, observational data are still vulnerable to unmeasured confounding. Thus, appropriate sensitivity analysis needs to be developed to assess the impact of hidden bias [5].

The issues of using the hazard ratio as a marginal causal effect measure have been discussed extensively in the literature. Greenland et al. [1] pointed out that the hazard ratio has the noncollapsibility property when the treatment effect is nonzero. Hernán [6] argued that using the hazard ratio as a treatment effect measure may not have valid causal interpretation even in randomized studies, since the hazard ratio has a built-in selection bias and may change over time. Martinussen and Vansteelandt [2] studied the estimation of the treatment effect in the presence of confounders and found that the amount of confounding due to noncollapsibility in the Cox proportional hazards (PH) model would be very difficult to quantify. Aalen et al. [7] offered a more theoretical perspective on the conditions under which the hazard has a valid causal interpretation. They suggested that the hazard function h ( t , x , z ) must satisfy an additive assumption h ( t , x , z ) = a ( t , z ) + b ( t , x ) to yield a causal interpretation, where a ( t , z ) is a function of survival time t and treatment assignment z and b ( t , x ) is a function of survival time t and covariates x . Ni et al. [8] further illustrated that even under a PH model, the marginal hazard ratio is not a constant, after integrating out covariates. Thus, a valid and simple-to-use causal effect measure for survival outcomes is highly desirable.

1.2 RMST difference as a marginal causal effect measure

The restricted mean survival time (RMST) has been used in randomized clinical studies to evaluate treatment effects [9,10]. The RMST difference is more advantageous than the hazard ratio as a marginal effect measure. First, the RMST has an intuitive interpretation as the area under the survival curve over a certain time horizon. Second, the RMST difference is the difference of truncated mean survival time between two groups, which is essentially a mean difference. So it is collapsible, meaning that the marginal and conditional effects are compatible. Third, the treatment effect measured by the RMST difference can be asymptotically unbiasedly estimated without PH assumption, while the conventional Cox model heavily relies on such assumption.

To take advantage of the collapsibility of RMST difference, we can construct RMST regression by including covariates to control for confounding or increase estimation efficiency. Several methods of regressing RMST on multiple covariates have been developed. Karrison [11] examined the RMST as an index for comparing survival in two groups and proposed to model the hazard with piece-wise exponential models assuming covariates have a multiplicative effect on the hazard. Zucker [12] further simplified the implementation procedure for Karrison’s method and provided an extended version to achieve robustness against model misspecification. Andersen et al. [13] compared several regression analysis methods of mean survival time and RMST, and they proposed a regression method based on pseudo-observations. Tian et al. [14] developed an RMST regression model with adjustment for baseline covariates. They constructed an estimating equation with the inverse probability of censoring weighting (IPCW) to obtain consistent estimates. Wang and Schaubel [15] modeled the RMST using generalized estimating equation methods, which allows censoring to depend on both baseline covariates and time-dependent factors.

Although RMST differences have been reported in many randomized clinical studies, there is only limited discussion of using RMST in observational studies, probably due to the challenge of confounding adjustment. Propensity score weighting and stratification methods have been explored in the literature. Zhang and Schaubel [16] derived a double-robust estimator for RMST difference based on the inverse probability of treatment weighted (IPTW) estimating equation with augmentation term. To adjust for confounding factors, they built three working models for survival time, treatment assignment, and censoring, then incorporated them into the augmentation term. They assumed the PH assumption in outcome modeling, which might be violated in practice. Conner et al. [17] proposed a weighted method to compare the adjusted RMST difference directly. Unlike Zhang and Schaubel’s work, Conner et al. estimated the RMST based on the Kaplan–Meier (KM) estimator rather than the Nelson–Aalen estimator. They adjusted the KM estimator with IPTW and derived the adjusted RMST by integrating the IPTW-adjusted KM estimator. Ni et al. [8] proposed a propensity score stratified RMST difference estimation strategy to examine the marginal causal effect with observational survival data, which can combine stratification with further regression adjustment. Some existing methods still rely on modeling assumptions, and no matching based method has been proposed. More importantly, none of them touches on unmeasured confounding assessment, which is a big issue in observational studies. The matched design provides more flexibility in confounding adjustment. Observed confounding can be controlled via matching and additional regression modeling. Unobserved confounding can be explored via sensitivity analysis in matched datasets. To address the void of literature, we propose a matching-based RMST difference estimation strategy, which also facilitates the implementation of sensitivity analysis for hidden bias.

1.3 A motivating example: atherosclerosis risk in communities (ARIC)

In the United States, stroke is a severe disease that causes serious disability for adults and is a leading cause of death [18,19]. Several previous studies have shown that smoking is an important risk factor for stroke [20,21], and even passive smoking could increase the risk of stroke [22]. Although the causal pathway between smoking and stroke is unclear, Shah and Cole [23] found that the more people smoke, the more likely they were to have a stroke, and people who quit smoking showed a significantly lower risk of stroke, which provides some evidence for the causal relationship between smoking and stroke.

The ARIC study [24] is a prospective cohort study conducted in four U.S. communities. Four thousand adults aged 45–64 years were randomly sampled from each of four U.S. communities, and the final dataset contains information of 15,792 individuals. After a baseline examination during 1987–1989, subjects were followed up for the development of incident ischemic stroke, and the first definite or probable hospitalized stroke. Due to the length of follow-up, not all event times were observed, so the data were subjected to censoring. One primary outcome is the time to first stroke or death (whichever comes first), and a subject is censored if the incidence of stroke or death is not observed by the end of the study. We try to answer a causal question, using this ARIC dataset: how smokers’ stroke-free survival would change had they not smoked at baseline. Matched design is a natural choice to address this as it is about the causal effect of those being exposed to smoking, rather than for the entire population.

Existing literature mostly used the Cox PH model to analyze the ARIC data. Kwon et al. [25] and Ding et al. [26] studied the association between smoking status and risk of stroke, using the Cox PH model to estimate the hazard ratio (HR) of smoking status on the risk of stroke. Thus, the estimated effects were interpreted as conditional rather than marginal. Moreover, it is possible that some prognostic factors were not included in the confounder adjusted regression model, which would lead to biased estimates of conditional effects. An analysis with RMST as the effect measure may provide new insight into this research.

In this article, we propose a propensity score matching-based RMST difference estimator and develop a corresponding sensitivity analysis strategy for assessing the impact due to unmeasured confounding. We apply this method to the ARIC study to examine the causal effect of smoking on stroke-free survival. The rest of this article is organized as follows: In Section 2, we set up the notation and assumptions and describe the proposed RMST estimator with its theoretical properties. In Section 3, we conduct a simulation study to examine the empirical performance of our proposed method under different scenarios and also compare it with several commonly used methods in practice. In Section 4, we develop a sensitivity analysis strategy by adapting the E -value approach to matched data. In Section 5, we present the analysis results of the ARIC data. Section 6 concludes this article with some discussions.

2 Method: matched RMST difference estimation

2.1 Notation and assumptions

We follow the potential outcomes framework proposed by Rubin [27] to define the causal effects. In a two-arm survival analysis study, let A be the treatment assignment indicator (or more generally, the exposure status), such that A = 1 indicates being exposed to the treatment and A = 0 indicates being exposed to the control. Let T a denote the potential event time and S a ( t ) denote the corresponding survival function for a subject if under treatment value a . The following two assumptions are extensions of commonly used assumptions for causal inference in observational studies [28].

Assumption 1

Stable unit treatment value assumption (SUTVA). The potential survival times for one individual in the population do not vary with the treatment assigned to others. There are no different versions of the specified treatment level.

Assumption 2

Treatment assignment is strongly ignorable given covariates X , that is, ( T 0 , T 1 ) A X and 0 < p r ( A = 1 X ) < 1 .

The potential restricted event time is defined as Z a = min ( T a , τ ) , where τ is the truncation time point, which is usually prespecified at the design stage based on clinical relevance and study feasibility. Both T a and Z a are subject to censoring by a random variable C . We introduce two additional assumptions for survival data.

Assumption 3

Censoring is independent of potential survival times and baseline covariates within each treatment group, that is, C ( T 0 , T 1 ) A and C X A .

This assumption ensures that we can asymptotically unbiasedly estimate the survival function via the KM approach within the matched sample. This also implies the conditional independence between censoring and the truncated survival times, Z a .

Assumption 4

The truncation time point is smaller than the largest follow-up time, τ < t max , where t max is the largest follow-up time (event or censored).

Assumption 4 is a technical one to ensure that the prespecified τ is clinically meaningful, and RMST can be asymptotically unbiasedly estimated.

Let δ a = I ( Z a < C ) denote the censoring indicator, and then the observed restricted time is defined as Y a = min ( Z a , C ) = ( Z a ) δ a C ( 1 δ a ) . For a subject under treatment value a , the potential outcome of RMST is defined as μ a ( τ ) = E ( Z a ) = 0 τ S a ( t ) d t , then the average treatment effect (ATE) on RMST, denoted by Δ A T E , can be defined as follows:

Δ A T E = μ 1 ( τ ) μ 0 ( τ ) = E ( Z 1 ) E ( Z 0 ) = 0 τ [ S 1 ( t ) S 0 ( t ) ] d t .

Then the average treatment effect for the treated (ATT) on RMST, denoted by Δ ATT , can be defined as follows:

Δ ATT = μ A = 1 1 ( τ ) μ A = 1 0 ( τ ) = E ( Z 1 A = 1 ) E ( Z 0 A = 1 ) = 0 τ [ S A = 1 1 ( t ) S A = 1 0 ( t ) ] d t ,

which is more meaningful for many observational study applications, where only a portion of the population, not everyone, could have been exposed to the treatment. Since Z a = min ( T a , τ ) and τ is a fixed constant, ( T 0 , T 1 ) A X implies ( Z 0 , Z 1 ) A X , and similar conclusions could be made about assumption 3. Following Theorem 3 in Rosenbaum and Rubin [4], we can establish the strong ignorability based on propensity score e ( X ) = P ( A = 1 X ) for survival outcomes in proposition 1 (proof provided in Appendix A).

Proposition 1

Given Assumptions 1 and 2, we have ( T 0 , T 1 ) A e ( X ) , which further implies ( Z 0 , Z 1 ) A e ( X ) .

2.2 Matched RMST difference estimator

In randomized trials, the marginal causal effect of the treatment on RMST can be asymptotically unbiasedly estimated [29] by direct contrast of group-specific RMST estimates since confounding effects are eliminated by design. In observational studies, however, additional adjustments are needed for confounding control. Propensity score-based approaches are popular for this purpose, which may take the form of matching, stratification, or weighting [4,30]. Among different propensity score adjustment strategies, matching is a design tool that selects comparable control units to match with treated units, and it often results in more robust causal effect estimates as it does not rely on outcome model specification. Usually, matching uses all treated and a subset of control units, so it estimates the ATT [28].

Our proposed propensity score matched RMST estimation includes the following steps:

  1. Propensity score estimation. The propensity score is defined as the conditional probability of treatment given a vector of observed covariates [4]. We estimate the propensity score by fitting a logistic regression on A with X , though other estimation options, either parametric or nonparametric, are also available [31, 32].

  2. Propensity score matching. We use the optimal matching algorithm by Hansen and Klopfer [33] to create pair matches without replacement based on the estimated propensity score, and the unmatched controls will be removed from the matched sample. Matching quality is assessed by checking the postmatching covariate balance. Any substantial covariate imbalance would lead to a recalibration of the propensity score model. We will proceed to the next step only after a satisfactory balance is achieved. Note that matching is used as a design procedure, and the specific propensity score values are not used in the subsequent estimation process (e.g., not a part of any estimating equations). So the uncertainty of propensity score estimation is not considered in subsequent analysis and variance calculation.

  3. Treatment effect estimation. Suppose we obtain n pairs of data through matching, where each pair contains exactly one treated and one control subject. We estimate the RMST, μ ( τ ) , by μ ˆ ( τ ) = 0 τ S ˆ ( t ) d ( t ) , where S ˆ ( t ) is estimated by the nonparametric KM method. Let S ˆ 0 ( t ) and S ˆ 1 ( t ) denote the KM estimates of survival function for control and treated groups in the matched sample, respectively. On the basis of the matched sample, our estimator for the averaged treatment effect on the treated (ATT) is

    Δ ˆ ATT = μ ˆ 1 ( τ ) μ ˆ 0 ( τ ) = 0 τ [ S ˆ 1 ( t ) S ˆ 0 ( t ) ] d t .

The following two propositions show that the matched RMST difference estimator is asymptotically unbiased (both proofs are provided in Appendix A).

Proposition 2

Given Assumptions 14, the RMST estimator based on the KM method given propensity score e ( X ) and treatment group A , denoted as μ ˆ e ( X ) , A , is an asymptotically unbiased estimator for μ e ( X ) , A given τ < t max .

Proposition 3

Given Assumptions 14, Δ ˆ ATT is asymptotically unbiased.

2.3 Variance estimation

The matching process may introduce correlations between the two subjects in the same pair, as they are matched on similar propensity scores. Therefore, the variance calculation of Δ ˆ ATT needs to account for such correlation:

var ( Δ ˆ ATT ) = var 0 τ S ˆ 0 ( t 0 ) d t 0 + var 0 τ S ˆ 1 ( t 1 ) d t 1 2 cov 0 τ S ˆ 0 ( t 0 ) d t 0 , 0 τ S ˆ 1 ( t 1 ) d t 1 .

The overall variance has two components, the marginal variance of RMST estimates and their covariance. For two dependent event times with independent censoring and no competing risk, Murray and Cole [34] provided closed-form asymptotic covariance formulas for KM survival estimates and corresponding RMST estimates. To address the dependence structure introduced in the matching process, we adapt their formulas to compute the covariance between the control and treated group RMST estimates in the matched sample.

Specifically, let T 0 be the event time for a subject from the control group with marginal hazard function h 0 ( ) , and T 1 be the event time for a subject from the treatment group with marginal hazard function h 1 ( ) , then the event times for a matched pair of control and treated subject can be denoted as ( T 0 , T 1 ) . Let C 0 and C 1 be the censoring variables for the control and treated subject, respectively. Then the observed time can be denoted as T ˜ 0 = min ( T 0 , C 0 ) for control group with censoring indicator δ 0 = I ( T 0 < C 0 ) and T ˜ 1 = min ( T 1 , C 1 ) for treated group with censoring indicator δ 1 = I ( T 1 < C 1 ) . Then, the joint hazard function is h i j ( u , v ) = lim Δ u , Δ v 0 1 Δ u Δ v P ( u T ˜ i < u + Δ u , v T ˜ j < v + Δ v , δ i = 1 , δ j = 1 T ˜ i u , T ˜ j v ) where i , j { 0 , 1 } , and the conditional hazard function is h i j ( u v ) = lim Δ u 0 1 Δ u P ( u T ˜ i < u + Δ u , δ i = 1 T ˜ i u , T ˜ j v ) , where i , j { 0 , 1 } . Then, the covariance between two RMSTs can be computed as follows:

cov 0 τ S ˆ 0 ( t 0 ) d t 0 , 0 τ S ˆ 1 ( t 1 ) d t 1 = 1 n 0 τ 0 τ S ˆ 0 ( t 0 ) S ˆ 1 ( t 1 ) 0 t 0 0 t 1 G 01 ( u , v ) d v d u d t 0 d t 1 = 1 n 0 τ 0 τ v τ S ˆ 0 ( t ) d t u τ S ˆ 1 ( t ) d t G 01 ( u , v ) d v d u ,

where G 01 ( u , v ) = P ( T ˜ 0 u , T ˜ 1 v ) P ( T ˜ 0 u ) P ( T ˜ 1 v ) [ h 01 ( u , v ) h 0 1 ( u v ) h 1 ( v ) h 1 0 ( v u ) h 0 ( u ) + h 0 ( u ) h 1 ( v ) ] . Details about the computation of function G 01 ( u , v ) are included in Appendix C.

For the marginal variances, two methods may be considered:

  1. Murray’s method: The aforementioned covariance formulas can be used to compute the marginal variance, since the marginal variance of RMST could be written as the covariance with itself, that is, var 0 τ S ˆ ( t ) d t = cov 0 τ S ˆ ( t ) d t , 0 τ S ˆ ( t ) d t .

  2. Hosmer’s method: we may also consider the computation method introduced in the study by Hosmer et al. [35]. Let t 1 < t 2 < < t D represent distinct event times. For each k = 1 , , D , let Y k be the number of surviving units just prior to event time t k , and let d k be the number of events at t k . Let S ˆ ( t k ) = l = 1 k 1 d l Y l denotes the KM estimate of the survival function at event time t k , and let N τ be the number of t k values that are less than truncation time point τ , and then the RMST is estimated by

    0 τ S ˆ ( t ) d t = k = 1 N τ S ˆ ( t k 1 ) ( t k t k 1 ) + S ˆ ( t N τ ) ( τ t N τ ) ,

    and the marginal variance of RMST can be estimated as follows:

    var 0 τ S ˆ ( t ) d t = m m 1 k = 1 N τ d k A k 2 Y k ( Y k d k ) ,

    where A k = t k τ S ˆ ( t ) d t = l = k N τ S ˆ ( t l ) ( t l + 1 t l ) + S ˆ ( t N τ ) ( τ t N τ ) and m = l = 1 N τ d l .

In our simulation studies, we present the variance estimates under Murray’s method since the results from these two methods turn out to be very close.

3 Simulation studies

3.1 Data generation

To assess the empirical performance of the proposed method, we simulate an observational dataset with known confounders. Several existing methods for causal inference with survival outcomes are compared.

We generate ten independent baseline covariates denoted by X 1 to X 10 . Among them, X 1 , X 3 , , X 9 are five binary covariates following Bernoulli distribution with parameters 0.2, 0.4, 0.6, 0.8, and 0.5, respectively, and X 2 , X 4 , , X 10 are five continuous covariates following the standard normal distribution. We then generate potential survival time T 1 as the outcome under treatment and potential survival time T 0 as the outcome under control from Weibull distribution [36]. Specifically, we simulate a uniform random variable Q on [0,1] and then generate the potential survival time as follows ( j = 1 , 0 ):

T j = log ( Q ) λ 0 j exp ( β A j + X 1 + 1.2 X 4 + 1.4 X 6 + 1.6 X 7 + 1.6 X 8 + 1.4 X 9 + 1.2 X 10 ) 1 ν j ,

where A is the treatment indicator and β A is the conditional multiplicative treatment effect on the hazard function given covariates, and ν j and λ 0 j are the shape and baseline scale parameters of Weibull distribution for treatment group j , respectively. When ν 0 = ν 1 , we have the PH model, otherwise the model is nonproportional hazards (NP). The treatment indicator A is generated from Bernoulli distribution with P ( A = 1 X ) defined by the logistic model logit ( P ( A = 1 X ) ) = 1.95 + log ( 1.2 ) X 1 + log ( 1.1 ) X 2 + log ( 1.4 ) X 3 + log ( 1.2 ) X 4 + log ( 1.6 ) X 5 + log ( 1.3 ) X 6 + log ( 1.8 ) X 7 . Thus, X 1 , X 4 , X 6 , and X 7 are true confounders. This setup allows about 20% of the population to be exposed to treatment.

In the simulation, we assume censoring variable C is marginally independent of T a and X over treatment indicator A for simplicity, and C is generated from an exponential distribution with rate parameter γ , which is chosen to create four different levels of censoring. For simplicity, we use the same censoring variable for both arms in the simulation.

Let τ be the prespecified truncation time point, and the observed event time is T = T 0 ( 1 A ) + T 1 A . We generate the restricted event time Z = min ( T , τ ) and the observed restricted time Y = min ( Z , C ) = min ( T , C , τ ) . The restricted event time Z is censored if the observed time C < Z with censoring status δ Z = I ( Z < C ) , otherwise it is noncensored.

We simulate 500 datasets of sample size 2,500 for each scenario and set the truncation time point τ to 100. The true RMST difference is determined by calculating the empirical difference between the potential RMSTs under treated and control conditions, and we compute both ATT and ATE versions of true RMST difference to serve as benchmarks for different methods as appropriate. In the j th simulated dataset, we calculate Δ j = i = 1 n Z i 1 Z i 0 n , where Z i A = min ( T i A , τ ) is the potential restricted event time for the i th individual and n is the sample size of the treated group (for ATT) or the entire sample (for ATE). Then, the true marginal effect on RMST is calculated as Δ 0 = j = 1 500 Δ j 500 .

Both PH and NP settings are examined. Under both settings, we set β A to five different values: 0 , 0.4 , 0.8 , 1.2 , and 2 . For each treatment effect value, we also consider four different levels of censoring rates (CRs), which are 0, 20, 40, and 60%. Detailed parameter setup for PH and NP scenarios in observational studies are summarized in Table 1.

Table 1

Simulation studies: parameter setup for observational studies scenarios with independent censoring

( ν 0 , λ 0 ) ( ν 1 , λ 1 ) β A Rate parameter gamma
0% 20% 40% 60%
Nonrandomized PH
(1, exp( 6 )) (1, exp( 6 )) 0 1.00 × 1 0 8 0.0051 0.0142 0.0467
(1, exp( 6 )) (1, exp( 6 )) 0.4 1.00 × 1 0 8 0.004616 0.0124 0.0345
(1, exp( 6 )) (1, exp( 6 )) 0.8 1.00 × 1 0 8 0.00421 0.011 0.0272
(1, exp( 6 )) (1, exp( 6 )) 1.2 1.00 × 1 0 8 0.003872 0.00992 0.0226
(1, exp( 6 )) (1, exp( 6 )) 2 1.00 × 1 0 8 0.0034 0.0084 0.01731
Nonrandomized nonPH
(1, exp( 6 )) (1, exp( 6 )) 0 1.00 × 1 0 8 0.0051 0.0142 0.0467
(1, exp( 6 )) (1.5, 1.23 × 1 0 4 ) 0.4 1.00 × 1 0 8 0.003644 0.00904 0.0189
(1, exp( 6 )) (1.5, 1.23 × 1 0 4 ) 0.8 1.00 × 1 0 8 0.00343 0.0084 0.01692
(1, exp( 6 )) (1.5, 1.23 × 1 0 4 ) 1.2 1.00 × 1 0 8 0.00324 0.00787 0.01542
(1, exp( 6 )) (1.5, 1.23 × 1 0 4 ) 2 1.00 × 1 0 8 0.00295 0.00702 0.01335

3.2 Estimation strategies

The proposed method is compared with three existing estimation strategies:

  1. Propensity score matched RMST estimation. This is our proposed method as described in the previous section, and the propensity score is estimated using the correct model specification. The estimated treatment effect is compared to the ATT version of the true RMST difference in our simulation.

  2. Conner’s IPTW RMST estimation. This method is proposed by Conner et al. [17], and they estimated the RMST based on the inverse probability treatment weighting (IPTW) adjusted KM estimator. In our simulation, we use the ATT version of weight to adjust for observed confounding, so it is compared to the true ATT RMST difference. The propensity score is estimated using the correct model specification.

  3. Tian’s RMST regression. This method is proposed by Tian et al. [14], which uses the IPCW estimating equation with identity link function to estimate the treatment effect on RMST with adjustment for covariates. The estimated treatment effect is compared to ATE version of the true RMST difference. We consider four different outcome models in Tian’s RMST regression: (1) outcome model using the treatment indicator only; (2) outcome model using the true covariate set; (3) outcome model using all covariates; and (4) outcome model using a wrong covariate set. Due to space limitation, only the results of RMST regression with true covariates are summarized in the following section, which has the best performance among the four models. An important caveat is that the RMST regression model with the true covariate set does not represent the true outcome model since the data are generated based on a hazard model.

  4. Inverse probability treatment weighting (IPTW) Cox regression. This method estimates β A . The propensity score is estimated using the correct model specification. We use the ATT weight to fit a weighted Cox regression model and regard β A as the truth to calculate the bias and coverage probabilities since there is no single value true marginal hazard ratio. We consider four different outcome models in the IPTW Cox regression : (1) outcome model using the treatment indicator only; (2) outcome model using the true covariate set; (3) outcome model using all covariates; (4) outcome model using a wrong covariate set. Due to space limitation, only the results of IPTW Cox regression with the true covariate set are summarized in the following section, which has the best performance among the four models. We understand that the results here are not directly comparable to the first three methods, as they are based on different effect measures. Due to the high popularity of the IPTW Cox model in practice, however, we think there is some value in presenting the results as a reference.

3.3 Performance assessment

We summarize treatment effect estimates from 500 Monte Carlo iterations into four measures: (1) percentage bias (Bias %), which is the bias divided by the true value for nonzero treatment effect scenarios. For the zero treatment effect scenario, we just report the bias. The bias is computed as the average of 500 treatment effect estimates minus the truth; (2) coverage probability (CP), which is the proportion of 500 95% confidence (CIs) intervals that cover the truth; (3) model-based standard error (SEM), which is the average of the 500 estimated standard errors from the model-based formula; and (4) empirical standard error (SEE), which is the standard error of the 500 point estimates of the treatment effect.

3.4 Results

Simulation results under the PH setting are summarized in Table 2. The proposed matched RMST method generates unbiased estimates of the target parameters under most scenarios, and the coverage probabilities are around 95%. For a small effect size ( β A = 0.4 ), the bias is a bit large for a high CR. Conner’s method has a similar performance, with moderately larger biases. Averaging across all scenarios, bias from Conner’s method is 65% higher than our method. The results of the IPTW Cox model are mostly good since we use the correct outcome model. The CP may be a bit lower than the nominal level, sometimes, which may be due to the underestimated standard error. Tian’s RMST regression method shows a relatively large percentage bias and lower CP, especially under scenarios with large treatment effects. This is likely due to the incorrect covariate functional form specification in the model even though we include the right covariate set.

Table 2

Simulation studies: results for proportional hazards scenarios with independent censoring

Scenario Bias (%) CP SEM SEE Bias (%) CP SEM SEE Bias (%) CP SEM SEE Bias (%) CP SEM SEE
β A CR Matched RMST (Murray) Tian’s RMST regression Conner’s IPTW RMST IPTW Cox (HR)
0 0 0.032 0.964 2.795 2.611 0.020 0.932 1.322 1.402 0.076 0.962 2.284 2.127 0.005 0.938 0.052 0.052
0.2 0.137 0.958 2.881 2.666 0.016 0.938 1.481 1.538 0.154 0.960 2.353 2.197 0.005 0.938 0.067 0.069
0.4 0.190 0.956 3.066 2.903 0.007 0.928 1.924 2.030 0.211 0.962 2.501 2.424 0.009 0.928 0.074 0.078
0.6 0.343 0.961 4.185 4.115 0.476 0.800 4.735 7.362 0.360 0.954 3.509 3.448 0.011 0.937 0.085 0.087
0.4 0 0.724% 0.970 2.795 2.588 2.746% 0.944 1.326 1.369 1.581% 0.958 2.284 2.101 1.062 % 0.944 0.053 0.052
0.2 2.740% 0.964 2.873 2.643 3.399% 0.940 1.479 1.510 3.184% 0.956 2.347 2.173 1.122 % 0.938 0.069 0.072
0.4 4.695% 0.966 3.027 2.807 3.944% 0.936 1.862 1.965 4.833% 0.960 2.471 2.327 1.533 % 0.946 0.076 0.077
0.6 5.925% 0.960 3.686 3.572 7.094% 0.874 4.089 4.811 9.209% 0.964 3.022 2.940 2.022 % 0.936 0.086 0.087
0.8 0 0.364% 0.970 2.785 2.570 3.819% 0.938 1.329 1.330 0.794% 0.962 2.271 2.078 0.448 % 0.944 0.055 0.053
0.2 1.255% 0.964 2.857 2.635 4.190% 0.938 1.478 1.482 1.479% 0.962 2.328 2.151 0.475 % 0.938 0.072 0.074
0.4 2.193% 0.962 2.987 2.734 4.577% 0.938 1.817 1.830 2.237% 0.968 2.434 2.262 0.528 % 0.944 0.080 0.080
0.6 1.638% 0.964 3.416 3.260 5.107% 0.918 3.316 3.609 3.175% 0.958 2.788 2.672 0.755 % 0.936 0.089 0.089
1.2 0 0.144% 0.966 2.765 2.550 4.838% 0.934 1.333 1.304 0.481% 0.958 2.245 2.072 0.260 % 0.948 0.058 0.057
0.2 0.706% 0.960 2.831 2.607 4.991% 0.944 1.479 1.431 0.901% 0.964 2.298 2.139 0.227 % 0.932 0.077 0.078
0.4 1.476% 0.966 2.944 2.649 5.634% 0.936 1.788 1.762 1.438% 0.968 2.390 2.197 0.260 % 0.946 0.084 0.083
0.6 1.140% 0.964 3.256 3.054 4.877% 0.944 2.895 2.940 1.878% 0.956 2.644 2.539 0.391 % 0.942 0.093 0.093
2 0 0.077% 0.964 2.698 2.531 6.604% 0.854 1.338 1.308 0.296% 0.960 2.160 2.042 0.094 % 0.958 0.067 0.065
0.2 0.409% 0.958 2.755 2.545 6.654% 0.876 1.481 1.474 0.571% 0.954 2.206 2.087 0.066% 0.956 0.089 0.087
0.4 0.598% 0.968 2.848 2.635 6.796% 0.890 1.754 1.785 0.684% 0.952 2.280 2.153 0.058 % 0.954 0.097 0.096
0.6 0.551% 0.968 3.044 2.828 6.250% 0.908 2.484 2.575 0.758% 0.970 2.440 2.295 0.228 % 0.958 0.106 0.104

Under zero treatment effect scenarios, bias is reported instead of percentage bias.

Simulation results under the NP setting are summarized in Table 3. Both our matched RMST method and Conner’s method have similar performance (with the latter having more bias) as under the PH setting since these methods do not rely on the PH assumption. Tian’s RMST regression method performs somewhat worse, with a bigger bias and much lower than ideal coverage probabilities. Because the PH assumption does not hold here, the IPTW Cox model completely misses the target with large bias and very small coverage probabilities.

Table 3

Simulation studies: results for nonproportional hazards scenarios with independent censoring

Scenario Bias (%) CP SEM SEE Bias (%) CP SEM SEE Bias (%) CP SEM SEE Bias (%) CP SEM SEE
β A CR Matched RMST (Murray) Tian’s RMST regression Conner’s IPTW RMST IPTW Cox (HR)
0 0 0.032 0.964 2.795 2.611 0.020 0.932 1.322 1.402 0.076 0.962 2.284 2.127 0.005 0.938 0.052 0.052
0.2 0.137 0.958 2.881 2.666 0.016 0.938 1.481 1.538 0.154 0.960 2.353 2.197 0.005 0.938 0.067 0.069
0.4 0.190 0.956 3.066 2.903 0.007 0.928 1.924 2.030 0.211 0.962 2.501 2.424 0.009 0.928 0.074 0.078
0.6 0.343 0.961 4.185 4.115 0.476 0.800 4.735 7.362 0.360 0.954 3.509 3.448 0.011 0.937 0.085 0.087
0.4 0 0.196% 0.964 2.674 2.473 6.560% 0.888 1.244 1.215 0.428% 0.964 2.131 1.966 102.646% 0.000 0.056 0.061
0.2 0.684% 0.966 2.745 2.502 6.777% 0.894 1.386 1.347 0.835% 0.964 2.192 2.017 299.268% 0.000 0.079 0.081
0.4 1.150% 0.964 2.860 2.583 7.132% 0.902 1.659 1.643 1.133% 0.964 2.292 2.106 363.043% 0.000 0.091 0.094
0.6 1.168% 0.972 3.122 2.917 6.584% 0.920 2.427 2.479 1.389% 0.966 2.521 2.361 420.641% 0.000 0.105 0.109
0.8 0 0.158% 0.966 2.644 2.461 7.174% 0.820 1.253 1.212 0.345% 0.958 2.091 1.954 33.373% 0.004 0.057 0.062
0.2 0.425% 0.968 2.710 2.468 7.298% 0.850 1.394 1.351 0.561% 0.966 2.148 1.991 139.830% 0.000 0.083 0.085
0.4 0.801% 0.968 2.815 2.569 7.541% 0.876 1.656 1.660 0.845% 0.964 2.240 2.072 172.413% 0.000 0.096 0.099
0.6 0.906% 0.966 3.032 2.831 7.119% 0.890 2.316 2.411 1.014% 0.968 2.429 2.308 200.416% 0.000 0.110 0.114
1.2 0 0.079% 0.966 2.605 2.434 7.806% 0.724 1.261 1.224 0.263% 0.960 2.042 1.925 10.204% 0.446 0.059 0.063
0.2 0.321% 0.962 2.668 2.444 7.912% 0.772 1.402 1.394 0.463% 0.966 2.095 1.960 87.060% 0.000 0.088 0.091
0.4 0.565% 0.970 2.764 2.524 7.969% 0.828 1.652 1.661 0.626% 0.956 2.179 2.027 109.224% 0.000 0.103 0.106
0.6 0.599% 0.980 2.950 2.755 7.634% 0.884 2.243 2.280 0.652% 0.958 2.340 2.242 127.609% 0.000 0.117 0.121
2 0 0.053% 0.968 2.514 2.362 9.169% 0.512 1.279 1.238 0.204% 0.958 1.923 1.827 -8.272% 0.274 0.063 0.067
0.2 0.143% 0.962 2.569 2.374 9.204% 0.590 1.419 1.426 0.287% 0.952 1.969 1.864 45.545% 0.000 0.101 0.103
0.4 0.253% 0.966 2.650 2.453 9.131% 0.680 1.654 1.711 0.341% 0.952 2.039 1.940 59.243% 0.000 0.117 0.122
0.6 0.295% 0.966 2.795 2.640 8.792% 0.804 2.158 2.106 0.334% 0.946 2.164 2.110 70.309% 0.000 0.133 0.138

Under zero treatment effect scenarios, bias is reported instead of percentage bias.

4 Sensitivity analysis based on matched design

4.1 An overview of E -value

Propensity score adjustment can only control for observed confounding. Unmeasured confounding is likely to be present in observational studies since researchers have no control over the treatment assignment. Thus, sensitivity analysis is important to assess the impact of hidden bias.

Ding and VanderWeele [37,38] developed a new sensitivity analysis strategy, known as the E -value method. It assumes a hypothetical unmeasured confounder, U , and provides a lower bound of the strength of association on the risk ratio scale that U would have both the exposure and the outcome, to explain away the observed association. Below is a brief review of the conventional E -value method to set the stage for our sensitivity analysis of RMST difference.

Let A denote a binary exposure and D denote a binary outcome, X is a vector of measured confounders, and U is a binary unmeasured confounder with levels k = 0 , 1 . The observed relative risk of exposure A on the outcome D within stratum of X = x is expressed as follows:

RR A D x obs = P ( D = 1 A = 1 , X = x ) P ( D = 1 A = 0 , X = x ) .

Then the relative risk of exposure on level k of the unmeasured confounder U within the stratum of X = x is expressed as follows:

RR A U , k x = P ( U = k A = 1 , X = x ) P ( U = k A = 0 , X = x ) .

Since U is not observed, to facilitate the analysis, we take the maximal relative risk of A on U within stratum X = x , denoted as RR A U x = max k RR A U , k x . Similarly, we can define an upper bound for the relative risk between U and D as RR U D x = max ( RR U D A = 0 , x , RR U D A = 1 , x ) , where RR U D A , x is an upper bound of the relative risk between U and D in exposed or unexposed group, within stratum X = x . If X and U are sufficient to control for all confounding effects, the true causal relative risk is

RR A D x true = k = 0 1 P ( D = 1 A = 1 , X = x , U = k ) P ( U = k X = x ) k = 0 1 P ( D = 1 A = 0 , X = x , U = k ) P ( U = k X = x ) .

The relative risk pair ( RR A U x , RR U D x ) are used to measure the strength of confounding between the exposure A and the outcome D induced by the confounder U within the stratum of X = x . Even though we cannot estimate the true relative risk, its ratio with the observed relative risk is bounded by the following quantity, which is a function of the sensitivity parameters RR A U x and RR U D x :

RR A D x obs RR A D x true RR A U x × RR U D x RR A U x + RR U D x 1 .

For given values of RR A U x and RR U D x , we can identify a range of possible values for the true relative risk. If the range covers one, the observed significant association would be explained away by the presence of unmeasured confounding at the given magnitude.

4.2 Sensitivity analysis on RMST difference with matched data

This E -value method can be adapted to conduct sensitivity analysis for our RMST difference estimator in matched design. There are a series of propositions to justify the theoretical validity of using the E -value for the RMST difference estimator. In the interest of space, we just illustrate the main idea in this subsection and present the propositions and their detailed proofs in Appendix B.

Let A be the treatment indicator and Z = min ( T , τ ) be the RMST outcome, where T is the event time and τ is the truncation time point. Let e ( X ) be the propensity score and U be a binary unmeasured confounder with levels k = 0 , 1 . The relative risk of treatment A on level k of the unmeasured confounder U with a given propensity score value e ( X ) = e ( x ) is defined as follows:

RR A U , k e ( x ) = P ( U = k A = 1 , e ( X ) = e ( x ) ) P ( U = k A = 0 , e ( X ) = e ( x ) ) .

The maximal relative risk of A on U with e ( X ) = e ( x ) is RR A U e ( x ) = max k RR A U , k e ( x ) . We define the expectations of the RMST outcome Z given U = u and e ( X ) = e ( x ) with and without treatment as follows:

r 1 ( u ) = E ( Z A = 1 , U = u , e ( X ) = e ( x ) ) , r 0 ( u ) = E ( Z A = 0 , U = u , e ( X ) = e ( x ) ) .

Then, the mean ratios of U on Z with and without treatment with e ( X ) = e ( x ) are defined as follows:

MR U Z A = 1 , e ( X ) = e ( x ) = max u r 1 ( u ) min u r 1 ( u ) , MR U Z A = 0 , e ( X ) = e ( x ) = max u r 0 ( u ) min u r 0 ( u ) ,

MR U Z e ( X ) = e ( x ) = max ( MR U Z A = 1 , e ( X ) = e ( x ) , MR U Z A = 0 , e ( X ) = e ( x ) ) .

As shown in proposition 4 in Appendix B, since both unmeasured confounder parameters RR A U e ( X ) = e ( x ) and MR U Z e ( X ) = e ( x ) are no less than 1, we can identify the bounding factor as follows:

(1) B F U e ( X ) = e ( x ) = RR A U e ( X ) = e ( x ) × MR U Z e ( X ) = e ( x ) RR A U e ( X ) = e ( x ) + MR U Z e ( X ) = e ( x ) 1 ,

where ( RR A U e ( X ) = e ( x ) , MR U Z e ( X ) = e ( x ) ) are prespecified sensitivity analysis parameters. In theory, one can identify a separate bounding factor for each matched pair and choose the maximal value to facilitate the calculation. But this could be quite cumbersome in practice. Instead, we follow the idea of using Γ in the conventional Rosenbaum’s sensitivity analysis, which is a prespecified upper bound of the association. Denote RR A U = max e ( x ) ( RR A U e ( X ) = e ( x ) ) and MR U Z = max e ( x ) ( MR U Z e ( X ) = e ( x ) ) , then the maximal bounding factor can be calculated as follows:

(2) B F U = RR A U × MR U Z RR A U + MR U Z 1 .

Because B F U is an increasing function of both RR A U and MR U Z , taking prespecified upper bounds of both associations leads to an upper bound of bounding factors. In practice, one can identify a range of possible values for ( RR A U , MR U Z ) and calculate the upper bound of treatment effects for each combination using the following formulas:

Let ACE A Z true denote the true average causal effect. When the treatment effect is positive, we have

(3) ACE A Z true 1 2 1 + 1 B F U E ( Z A = 1 ) 1 2 ( 1 + B F U ) E ( Z A = 0 ) .

When the treatment effect is negative, we have

(4) ACE A Z true 1 2 ( 1 + B F U ) E ( Z A = 1 ) 1 2 1 + 1 B F U E ( Z A = 0 ) .

The sensitivity analysis is generally done by checking the behavior of the 95% CI bounds. Since our real data analysis shows a negative treatment effect, we focus on equation (4) and use a one-sided 95% CI for illustration purposes. Equation (4) implies that the treatment effect estimate should be bounded by a function of the bounding factor and the mean survival times from each treatment group. Denote the right-hand side quantity as right-hand side (RHS). We can calculate the 95% CI for RHS using normal approximation and denote the upper confidence bound as RHS u b . Because ACE A Z true RHS , we have P ( ACE A Z true RHS u b ) 0.95 . Therefore, we can regard RHS u b as an upper bound for the upper 95% confidence bound of the treatment effect estimate. If this value is less than zero, we would reject the null hypothesis. Otherwise, we would retain the null.

4.3 Interpreting the sensitivity analysis

For an unmeasured confounding with a prespecified magnitude of associations ( RR A U , MR U Z ) , the bounding factor B F U can be computed by equation (2). For a positive treatment effect based on the observed data, we can compute a lower bound for the lower 95% confidence bound of the treatment effect estimate by equation (3). A positive lower bound indicates that there is still a positive treatment effect with an unmeasured confounding effect of magnitude ( RR A U , MR U Z ) . A nonpositive lower bound indicates that the positive treatment effect could be explained away by the unmeasured confounding of magnitude ( RR A U , MR U Z ) . For a negative treatment effect based on the observed data, we can compute an upper bound for the upper 95% confidence bound of the treatment effect estimate by equation (4) as described earlier, and similar interpretations can be made. A negative upper bound indicates that there is still a negative treatment effect with an unmeasured confounding effect of magnitude ( RR A U , MR U Z ) . A nonnegative upper bound indicates that the negative treatment effect could be explained away by the unmeasured confounding of magnitude ( RR A U , MR U Z ) . A detailed numerical example is presented in Section 5.

5 Real data example

In this section, we apply our proposed method to the ARIC data [24] to examine the causal effect of baseline smoking on stroke-free survival. Incident ischemic stroke events or death, the primary outcome, are identified through December 31, 2011. After excluding a small portion of subjects with missing values in the variables of interest, the total sample size used in the analysis is 14,549. The event time is defined as the follow-up time (in months) for the first incident stroke or death, whichever comes first, and a subject is censored if neither incident stroke nor death is observed during the study. There are 5,345 events, corresponding to a 63.3% CR. Given the length of follow-up, we choose 240 months as the truncation time τ for the RMST calculation. Exposure is defined as the smoking status at baseline. There are 3,832 (26.3%) current smokers at baseline. Eight important baseline covariates are included in the propensity score model: race (black, white), gender (male, female), age (44–66 years), body mass index (BMI) (14.2–65.9), diabetes (1 = yes, 0 = no), high-density lipoprotein (HDL) (10–163  mg/dL), low-density lipoprotein (LDL) (0–504.6 mg/dL), and hypertension (1 = yes, 0 = no). Table 4 summarizes these variables by baseline smoking status.

Table 4

Real data example: summary statistics of covariates by baseline smoking status in ARIC study

Non-current smoker (10,717) Current smoker (3,832)
Race, n (%) of white 8,161 (76.2%) 2,730 (71.2%)
Gender, n (%) of female 5,957 (55.6%) 2,003 (52.3%)
Age, mean (SD) 54.4 (5.8) 53.7 (5.7)
BMI, mean (SD) 28.1 (5.4) 26.3 (5.0)
Diabetes, n (%) 1,044 (9.7%) 333 (8.7%)
HDL (mmol/L), mean (SD) 52.6 (16.8) 49.6 (17.3)
LDL (mmol/L), mean (SD) 137.6 (38.9) 138.6 (40.4)
Hypertension, n (%) 3,783 (35.3%) 1,225 (32.0%)

We first fit a logistic regression model on baseline smoking status using the eight covariates to estimate the propensity score. Then, we conduct a 1–1 optimal pair matching without replacement for all subjects, which results in 3,832 pairs and unmatched nonsmokers being removed from the matched sample. The covariates balance is measured by the standardized mean difference, and Figure 1 shows the covariates balance of ARIC data before and after propensity score matching, which indicates our matching achieves very good covariates balance.

Figure 1 
               Real data example: covariates balance checking.
Figure 1

Real data example: covariates balance checking.

For comparison purposes, the analysis results of the proposed method, Tian’s RMST regression, and the IPTW Cox regression are all reported in Table 5. All methods show significant evidence of a harmful effect of smoking on the risk of incident ischemic stroke or death. This conclusion agrees with previous findings in the literature. The matched RMST analysis suggests an average reduction of 22.3 stroke-free survival months for baseline smokers had they not smoked at the baseline. Tian’s RMST regression method provides similar results as our proposed method in this dataset. The IPTW Cox regression measures the treatment effect on the hazard ratio scale, which is not directly comparable to RMST differences. The estimated HR of 2.2 implies that smoking increases the hazard of incident ischemic stroke or death.

Table 5

Real data example: ARIC data analysis results

Estimate SE 95% CI (one-sided)
Matched RMST 22.266 1.380 ( , 19.996 ]
Tian’s RMST regression 22.666 1.129 ( , 20.809 ]
IPTW cox (HR) 2.173 0.066 [ 2.068 , )

For the matched RMST method and Tian’s RMST regression method, the 95% CI bound is the upper bound. For the IPTW Cox method, the 95% CI bound is the lower bound.

All the aforementioned analyses assume the ignorable treatment assignment. However, for such a large observational study, unmeasured confounding is likely to be present, especially given that we are only able to control a small number of factors. Therefore, it is important to assess how the observed causal effect may change in the presence of hidden bias. A sensitivity analysis, as described in Section 4.2, is carried out for different possible impacts of U on the exposure and the outcome.

We calculate the upper bound of the upper 95% confidence bound of estimated treatment effects by exploring different combinations of ( RR A U , MR U Z ) , where both values are larger than 1. For illustrative purposes, we focus on a range of values between 1.15 and 1.75 to create a contour plot in Figure 2. Different gray scales reflect different upper bound values of the upper 95% confidence bound of the treatment effect, with the lighter color indicating smaller values and the darker color indicating larger values. The solid curve in the middle of the plot is when the upper bound of the upper 95% confidence bound of the treatment effect is zero.

Figure 2 
               Sensitivity analysis: contour plot of 95% CI upper bounds of the treatment effect upper bound. The solid curve represents value 0.
Figure 2

Sensitivity analysis: contour plot of 95% CI upper bounds of the treatment effect upper bound. The solid curve represents value 0.

For the area below this threshold, the upper bound of the upper 95% confidence bound of the treatment effect is still negative, implying a true negative causal effect even in the presence of unmeasured confounding. For the area above this threshold, the upper bound of the upper 95% confidence bound of the treatment effect becomes positive, implying the initial negative treatment effect can be explained away by the presence of unmeasured confounding.

In the real data analysis, we observe a negative treatment effect of 22.266 months with 95% confidence interval (CI) as ( , 19.996 ] based on our proposed matched RMST method. We also estimate that E ( Z A = 1 ) = 195.379 with SD = 1.087 and E ( Z A = 0 ) = 217.646 with SD = 0.834 , and the covariance between these two RMSTs is 0.014 . For example, with RR A U = 1.4 and MR U Z = 1.45 , we can calculate the bounding factor as follows: B F U = 1.4 × 1.45 1.4 + 1.45 1 = 1.097 , then following equation (4), RHS can be calculated as ACE A Z true 1 2 ( 1 + 1.097 ) 195.379 1 2 1 + 1 1.097 217.646 = 3.169 = RHS . The variance of RHS is expressed as follows:

Var(RHS) = 1 2 ( 1 + 1.097 ) 2 1.08 7 2 + 1 2 ( 1 + 1 1.097 ) 2 0.83 4 2 2 1 2 ( 1 + 1.097 ) 1 2 ( 1 + 1 1.097 ) ( 0.014 ) = 1.962 .

The 95% one-sided CI is ( , 3.169 + 1.64 1.962 ] = ( , 0.872 ] . Since the upper 95% confidence bound is still less than zero, we would reject the null hypothesis of no causal effect. This is robust to unmeasured confounding with the magnitude of impact up to RR A U = 1.4 and MR U Z = 1.45 .

For small-to-moderate deviations from the ignorability assumption ( RR A U < 1.43 and MR U Z < 1.43 ), a harmful effect still holds, as the upper bound of the 95% confidence bound of the treatment effect is below the solid zero-curve. For moderate-to-large deviations, the 95% CI upper bound of the treatment effect upper bound may exceed zero, indicating a possibility of a null effect. For example, at (1.5, 1.5), the upper bound of the estimated treatment effect is 2.036 and the upper bound of the 95% confidence bound of the treatment effect is 4.345, which indicates that the harmful treatment effect could be totally explained away by the unmeasured confounding of magnitude ( RR A U , MR U Z ) = ( 1.5 , 1.5 ) . Overall, our sensitivity analysis indicates that the observed significant causal effect is moderately robust to hidden bias.

6 Discussion

In this article, we adopt the RMST difference as a marginal causal effect measure for survival data, since it is collapsible and has an easy interpretation. We develop a matching-based RMST difference estimator that is asymptotically unbiased and does not rely on the PH assumption. But this does not rule out the use of hazard in causal analysis with survival data. As pointed out by Aalen et al. [7], the hazard function h ( t , x , z ) may have a valid causal interpretation, if it satisfies some additive structural constraint.

An interesting practical issue with RMST is the choice of τ . The common practice is to prespecify the truncation time at the study design stage or to make the decision independent of the observed outcomes. It is usually determined based on content expertise, for example, an important clinical time point for the disease under study. Kim et al. [39] picked a truncation time of 5 years for the Placement of Aortic Transcatheter Valves (PARTNER) trial as they were interested in the effect of transcatheter aortic valve replacement procedure versus routine medical treatment on preventing death in 5 years. Recently, Tian et al. [40] provided a more thorough discussion on the empirical choice of time window in RMST. They also showed that under a mild condition on the censoring distribution, one could make inferences about the RMST up to τ , where τ could be equal to the largest follow-up time (either observed or censored) in the study. With such choices, RMST incorporates all available information.

One limitation of our work is that the proposed nonparametric estimator may not be easily extended to more complex matching designs, such as 1- k or full matching designs. This is because we need to compute the covariance to account for the correlation in matched sets. But the covariance calculation relies on the assumption of equal sample sizes in both groups [34]. Therefore, the covariance formula can not be applied directly to other matching designs. One strategy to relax this limitation is to consider fitting a parametric RMST regression model after matching. This could be more advantageous if we have a good idea about the outcome model specification, as it may correct residual confounding bias not captured by matching. This adds more flexibility to postmatching inference, as it can lead to more robust or efficient semiparametric strategies by combining matching with regression models [41]. It also makes our method more attractive in practice than Conner’s method as the latter solely relies on KM estimation of survival functions and cannot include regression models.

Acknowledgments

This work was partially supported by grant DMS-2015552 from National Science Foundation. The Atherosclerosis Risk in Communities study has been funded in whole or in part with Federal funds from the National Heart, Lung, and Blood Institute, National Institutes of Health, Department of Health and Human Services, under Contract nos. HHSN268201700001I, HHSN268201700002I, HHSN268201700003I, HHSN268201700005I, and HHSN268201700004I. The authors thank the staff and participants of the ARIC study for their important contributions. The authors also thank the associate editor and two anonymous reviewers for their insightful comments, which lead to substantial improvement of the manuscript.

  1. Funding information: This work was partially supported by grant DMS-2015552 from National Science Foundation.

  2. Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.

  3. Conflict of interest: The authors state no conflict of interest.

  4. Informed consent: Not applicable.

  5. Ethical approval: The conducted research uses a de-identified secondary dataset in the analysis, hence not related to either human or animals use.

  6. Data availability statement: The ARIC dataset analyzed during the current study is available in the NHLBI Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC) [https://biolincc.nhlbi.nih.gov/studies/aric/].

Appendix A Theoretical results in Section 2

We will prove the propositions and related lemmas in Section 2.

Proposition A1

Given Assumptions 12, we have ( T 0 , T 1 ) A e ( X ) , which further implies ( Z 0 , Z 1 ) A e ( X ) .

Proof

It is equivalent to show P { A = 1 T 1 , T 0 , e ( X ) } = P { A = 1 e ( X ) } .

By Theorem 2 in Rosenbaum and Rubin [4], we have P ( A = 1 e ( X ) ) = E { e ( X ) e ( X ) } = e ( X ) , then it is equivalent to show P { A = 1 T 1 , T 0 , e ( X ) } = e ( X ) . We have

P { A = 1 T 1 , T 0 , e ( X ) } = E { P ( A = 1 T 1 , T 0 , X ) T 1 , T 0 , e ( X ) } = E { P ( A = 1 X ) T 1 , T 0 , e ( X ) } (by strongly ignorability) = E { e ( X ) T 1 , T 0 , e ( X ) } = e ( X ) = P { A = 1 e ( X ) } .

Thus, we have ( T 0 , T 1 ) A e ( X ) for 0 < pr ( A = 1 e ( X ) ) < 1 . Since Z A = min ( T A , τ ) and τ is a fixed constant, the aforementioned conditional independence also implies ( Z 0 , Z 1 ) A e ( X ) .□

Lemma A1

Given Assumptions 13, ( T 1 , T 0 ) A holds marginally in the matched sample under the propensity score matching design.

Proof

By Assumption 2, we have ( T 1 , T 0 ) A e ( X ) , where 0 < P ( A = 1 e ( X ) ) < 1 . Let M denotes the matching structure, and ε M denotes the set of propensity scores in the matched sample. Then, we have the following equation by matching on propensity score e ( X ) with a constant treatment to control allocation ratio 1 : k ( k = 1 for pair matching),

P ( A = 1 e ( X ) ) = 1 k + 1 , for all  e ( X ) ε M .

Thus, e ( X ) A holds in the matched sample, i.e., f M ( e ( X ) A ) = f M ( e ( X ) ) . Consider the joint density of T 1 and T 0 conditional on A in the matched sample, which is denoted as f M ( T 1 , T 0 A ) , we have

f M ( T 1 , T 0 A ) = ε M f ( T 1 , T 0 A , e ( X ) ) f M ( e ( X ) A ) d e ( X ) = ε M f ( T 1 , T 0 A , e ( X ) ) f M ( e ( X ) ) d e ( X ) [matched by constant allocation ratio] = ε M f ( T 1 , T 0 e ( X ) ) f M ( e ( X ) ) d e ( X ) [by Assumption 2] = f M ( T 1 , T 0 ) .

Since f M ( T 1 , T 0 A ) = f M ( T 1 , T 0 ) implies ( T 1 , T 0 ) A in the matched sample, ( T 1 , T 0 ) A holds marginally in the matched sample.□

Lemma A2

Let S ˆ e ( X ) , A ( t ) denotes the KM survival function estimator given propensity score e ( X ) and treatment indicator A. For a fixed truncation time τ ,

lim n 0 τ E T [ S ˆ e ( X ) , A ( t ) S e ( X ) , A ( t ) ] d t = 0 ,

Proof

Define T ˜ = min ( T , C ) and π e ( X ) , A ( t ) = P ( T ˜ t ) ( 0 , 1 ) , then [ 1 π e ( X ) , A ( t ) ] n is a nonnegative function that increases as t increases. By Lemma 3.2.1 in the study by Fleming and Harrington [29], we know:

0 τ E T [ S ˆ e ( X ) , A ( t ) S e ( X ) , A ( t ) ] d t 0 τ [ 1 S e ( X ) , A ( t ) ] [ 1 π e ( X ) , A ( t ) ] n d t 0 τ [ 1 π e ( X ) , A ( t ) ] n d t τ [ 1 π e ( X ) , A ( τ ) ] n .

Since τ > 0 is a fixed constant and 1 π e ( X ) , A ( τ ) ( 0 , 1 ) , we have

lim n 0 τ E T [ S ˆ e ( X ) , A ( t ) S e ( X ) , A ( t ) ] d t lim n τ [ 1 π e ( X ) , A ( τ ) ] n = 0 .

Therefore, we have lim n 0 τ E T [ S ˆ e ( X ) , A ( t ) S e ( X ) , A ( t ) ] d t = 0 .□

Proposition A2

Given Assumptions 14, the RMST estimator based on the KM method given propensity score e ( X ) and treatment group A , denoted as μ ˆ e ( X ) , A , is an asymptotically unbiased estimator for μ e ( X ) , A given τ < t max .

Proof

First, we will show that S ˆ e ( X ) , A ( t ) is asymptotically unbiased for any time T < t max . Let t i ’s be i.i.d event times ranking from small to large, and Y i is the number of people at risk at event time t i . Let d i be the number of event at event time t i , then we have the following definition.

S ˆ e ( X ) , A ( t ) = 1 , if t t 1 given e ( X ) and A t i t Y i d i Y i , if t 1 t given e ( X ) and A .

Let Λ ˆ e ( X ) , A ( u ) = t i < t d i Y i be the Nelson–Aalen estimator for the cumulative hazard function Λ e ( X ) , A ( u ) given e ( X ) and A . According to Theorem 3.2.3 in the study by Fleming and Harrington [29], we have the following equation if S e ( X ) , A ( t ) > 0 :

S ˆ e ( X ) , A ( t ) S e ( X ) , A ( t ) 1 = 0 t S ˆ e ( X ) , A ( u ) S e ( X ) , A ( u ) d { Λ ˆ e ( X ) , A ( u ) Λ e ( X ) , A ( u ) } , E [ S ˆ e ( X ) , A ( t ) S e ( X ) , A ( t ) ] = E I { T < t } S ˆ e ( X ) , A ( T ) { S e ( X ) , A ( T ) S e ( X ) , A ( t ) } S e ( X ) , A ( T ) .

Based on Lemma 3.2.1 in the study by Fleming and Harrington [29], the bias E [ S ˆ e ( X ) , A ( t ) S e ( X ) , A ( t ) ] will converge to zero as sample size n . Thus, S ˆ e ( X ) , A ( t ) is asymptotically unbiased given t < t max . Similarly, S ˆ e ( X ) , A = 0 ( t ) is also asymptotically unbiased given t < t max .

Second, we will show that μ ˆ e ( X ) , A is an asymptotically unbiased estimator given τ < t max . Since E T ( μ ˆ e ( X ) , A ) = E T 0 τ S ˆ e ( X ) , A ( t ) d t and S ˆ e ( X ) , A ( t ) is a positive bounded function between 0 and 1 when t [ 0 , τ ] , then we have

E T 0 τ S ˆ e ( X ) , A ( t ) d t = E T 0 τ S ˆ e ( X ) , A ( t ) d t τ < .

By Fubini’s theorem,

E T 0 τ S ˆ e ( X ) , A ( t ) d t = 0 τ E T [ S ˆ e ( X ) , A ( t ) ] d t .

By Proposition A2 and Lemma A2, we have the following for fixed truncated time τ .

lim n E T ( μ ˆ e ( X ) , A ) μ e ( X ) , A = lim n E T 0 τ S ˆ e ( X ) , A ( t ) d t 0 τ S e ( X ) , A ( t ) d t = lim n 0 τ E T [ S ˆ e ( X ) , A ( t ) ] d t 0 τ S e ( X ) , A ( t ) d t (by Fubini’s theorem) = lim n 0 τ E T { S ˆ e ( X ) , A ( t ) S e ( X ) , A ( t ) } d t lim n τ [ 1 π e ( X ) , A ( τ ) ] n = 0 .

Therefore, μ ˆ e ( X ) , A is an asymptotically unbiased estimator for μ e ( X ) , A when τ < t max .□

Lemma A3

For a fixed truncation time point τ < t max ,

lim n 0 τ E T [ S ˆ A A ( t ) S A ( t ) ] d t = 0 .

Proof

Let T ˜ = min ( T , C ) and π ( t ) = P ( T ˜ t ) ( 0 , 1 ) , then [ 1 π ( t ) ] n is a nonnegative function, which increases as t increases. Let π A ( t ) denotes the function π ( t ) for treatment indicator A. By Lemma 3.2.1 in the study by Fleming and Harrington [29], we have

0 τ E T [ S ˆ A ( t ) S A ( t ) ] d t 0 τ [ 1 S A ( t ) ] [ 1 π A ( t ) ] n d t 0 τ [ 1 π A ( t ) ] n d t τ [ 1 π A ( t ) ] n .

Since τ > 0 is a fixed constant and 1 π A ( τ ) ( 0 , 1 ) , we have

lim n 0 τ E T [ S ˆ A ( t ) S A ( t ) ] d t lim n τ [ 1 π A ( t ) ] n = 0 .

Proposition A3

Given Assumptions 14, Δ ˆ ATT = 0 τ [ S ˆ 1 ( t ) S ˆ 0 ( t ) ] d t is asymptotically unbiased.

Proof

Since the estimated survival function S ˆ ( t ) ( 0 , 1 ) and 0 τ S ˆ ( t ) d t ( 0 , τ ) , we also satisfy the following conditions to use Fubini’s theorem:

  1. E e ( X ) E T 0 τ S ˆ e ( X ) , A ( t ) d t E e ( X ) { E T ( τ ) } = τ <

  2. E e ( X ) 0 τ S ˆ e ( X ) , A ( t ) d t τ <

  3. E T 0 τ S ˆ A ( t ) d t τ <

Thus, we can apply Fubini’s theorem three times to interchange the expectation of e ( X ) as follows:

E e ( X ) E T 0 τ S ˆ e ( X ) , A = 1 ( t ) d t E T 0 τ S ˆ e ( X ) , A = 0 ( t ) d t = E T E e ( X ) 0 τ S ˆ e ( X ) , A = 1 ( t ) d t E e ( X ) 0 τ S ˆ e ( X ) , A = 0 ( t ) d t = E T 0 τ E e ( X ) [ S ˆ e ( X ) , A = 1 ( t ) ] d t 0 τ E e ( X ) [ S ˆ e ( X ) , A = 0 ( t ) ] d t = E T 0 τ S ˆ 1 ( t ) d t 0 τ S ˆ 0 ( t ) d t = 0 τ E T [ S ˆ 1 ( t ) ] d t 0 τ E T [ S ˆ 0 ( t ) ] d t .

By Lemma A3, we have

lim n E e ( X ) E T 0 τ S ˆ e ( X ) , A = 1 ( t ) d t E T 0 τ S ˆ e ( X ) , A = 0 ( t ) d t = lim n 0 τ E T [ S ˆ 1 ( t ) ] d t lim n 0 τ E T [ S ˆ 0 ( t ) ] d t = 0 τ E T [ S 1 ( t ) ] d t 0 τ E T [ S 0 ( t ) ] d t = μ 1 μ 0 .

Therefore, our proposed propensity score matched RMST estimator is asymptotically unbiased when truncation time point τ < t max .□

B Theoretical results in Section 4

B.1 Proofs of propositions about conditional effect

We define the expectations of the RMST outcome Z = min ( T , τ ) . The following propositions are proved within each propensity score value e ( X ) = e ( x ) .

Proposition A4

For binary unmeasured confounder U = 0 , 1 , we have RR A U e ( X ) = e ( x ) 1 and MR U Z e ( X ) = e ( x ) 1 .

Proof

By definition, we have MR U Z A = 1 , e ( X ) = e ( x ) 1 and MR U Z A = 0 , e ( X ) = e ( x ) 1 , then MR U Z e ( X ) = e ( x ) = max ( MR U Z A = 1 , e ( X ) = e ( x ) , MR U Z A = 0 , e ( X ) = e ( x ) ) 1 . Assume RR A U e ( x ) = max k = 0 , 1 RR A U , k e ( x ) < 1 , then it implies that

P ( U = 0 A = 1 , e ( X ) = e ( x ) ) < P ( U = 0 A = 0 , e ( X ) = e ( x ) ) , P ( U = 1 A = 1 , e ( X ) = e ( x ) ) < P ( U = 1 A = 0 , e ( X ) = e ( x ) ) .

This further implies that 1 = P ( U = 0 A = 1 , e ( X ) = e ( x ) ) + P ( U = 1 A = 1 , e ( X ) = e ( x ) ) < P ( U = 0 A = 0 , e ( X ) = e ( x ) ) + P ( U = 1 A = 0 , e ( X ) = e ( x ) ) = 1 , which is not true. Thus, we have proved by contradiction that RR A U e ( X ) = e ( x ) 1 .□

Proposition A5

CMR A Z + = MR A Z MR A Z + true B F U , CMR A Z = MR A Z MR A Z true B F U , CMR A Z = MR A Z MR A Z true B F U .

Proof

First, let f = P ( A = 1 ) , then we have

(A1) MR A Z true = r 1 ( u ) F ( d u ) r 0 ( u ) F ( d u ) = f r 1 ( u ) F 1 ( d u ) + ( 1 f ) r 1 ( u ) F 0 ( d u ) f r 0 ( u ) F 1 ( d u ) + ( 1 f ) r 0 ( u ) F 0 ( d u )

(A2) = f r 0 ( u ) F 1 ( d u ) f r 0 ( u ) F 1 ( d u ) + ( 1 f ) r 0 ( u ) F 0 ( d u ) × r 1 ( u ) F 1 ( d u ) r 0 ( u ) F 1 ( d u )

(A3) + ( 1 f ) r 0 ( u ) F 0 ( d u ) f r 0 ( u ) F 1 ( d u ) + ( 1 f ) r 0 ( u ) F 0 ( d u ) × r 1 ( u ) F 0 ( d u ) r 0 ( u ) F 0 ( d u ) .

Let w = f r 0 ( u ) F 1 ( d u ) f r 0 ( u ) F 1 ( d u ) + ( 1 f ) r 0 ( u ) F 0 ( d u ) [ 0 , 1 ] , then we have

MR A Z true = w MR A Z + true + ( 1 w ) MR A Z true ; 1 CMR A Z = w CMR A Z + + 1 w CMR A Z .

Second, we have

CMR A Z + = MR A Z obs MR A Z + true = r 1 ( u ) F ( d u ) r 0 ( u ) F ( d u ) / r 1 ( u ) F 1 ( d u ) r 0 ( u ) F 1 ( d u ) = w 1 max u r 0 ( u ) + ( 1 w 1 ) min u r 0 ( u ) w 0 max u r 0 ( u ) + ( 1 w 0 ) min u r 0 ( u ) ,

where w 1 = [ r 0 ( u ) min u r 0 ( u ) ] F 1 ( d u ) max u r 0 ( u ) min u r 0 ( u ) and w 0 = [ r 0 ( u ) min u r 0 ( u ) ] F 0 ( d u ) max u r 0 ( u ) min u r 0 ( u ) .

Define Γ = w 1 w 0 , then

(A4) Γ = w 1 w 0 = [ r 0 ( u ) min u r 0 ( u ) ] F 1 ( d u ) [ r 0 ( u ) min u r 0 ( u ) ] F 0 ( d u ) = [ r 0 ( u ) min u r 0 ( u ) ] RR A U ( u ) F 0 ( d u ) [ r 0 ( u ) min u r 0 ( u ) ] F 0 ( d u )

(A5) max x RR A U ( u ) [ r 0 ( u ) min u r 0 ( u ) ] F 0 ( d u ) [ r 0 ( u ) min u r 0 ( u ) ] F 0 ( d u ) = RR A U .

Write w 0 = w 1 Γ , then

CMR A Z + = [ max u r 0 ( u ) min u r 0 ( u ) ] w 1 + min u r 0 ( u ) [ max x r 0 ( u ) min u r 0 ( u ) ] w 1 Γ + min u r 0 ( u ) .

If Γ > 1 , CMR A Z + is increasing in w 1 according to Lemma A.1 in the Appendix of Ding and VanderWeele [37], then the maximum attains at w 1 = 1 , and we have

CMR A Z + Γ × MR U Z A = 0 Γ + MR U Z A = 0 1 RR A U × MR U Z A = 0 RR A U + MR U Z A = 0 1 .

If Γ 1 , CMR A Z + is nonincreasing in w 1 according to Lemma A.1 in the Appendix of Ding and VanderWeele [37], then the maximum attains at w 1 = 0 , and we have

CMR A Z + 1 RR A U × MR U Z A = 0 RR A U + MR U Z A = 0 1 .

Similarly, by 1 CMR A Z = w CMR A Z + + 1 w CMR A Z , we have

1 CMR A Z 1 B F U , CMR A Z B F U .

To study the average causal effect of the exposure on the difference scale, we need the following definitions:

  1. Define m 0 = E ( Z A = 0 ) and m 1 = E ( Z A = 1 ) , then the observed mean difference of exposure on the outcome is m 1 m 0 .

  2. The average causal effect of the exposure on the outcome for exposed is

    ACE A Z + true = E ( Z A = 1 , U = u ) F 1 ( d u ) E ( Z A = 0 , U = u ) F 1 ( d u ) = m 1 r 0 ( u ) F 1 ( d u ) .

  3. The average causal effect of the exposure on the outcome for unexposed is expressed as follows:

    ACE A Z true = E ( Z A = 1 , U = u ) F 0 ( d u ) E ( Z A = 0 , U = u ) F 0 ( d u ) = r 1 ( u ) F 0 ( d u ) m 0 .

  4. The average causal effect of the exposure on the outcome for whole population is

    ACE A Z true = E ( Z A = 1 , U = u ) F ( d u ) E ( Z A = 0 , U = u ) F ( d u ) = f ACE A Z + true + ( 1 f ) ACE A Z true .

Proposition A6

For nonnegative outcomes and ACE A Z obs 0 , the lower bounds for the average causal effects are expressed as follows:

ACE A Z + true m 1 m 0 × B F U ; ACE A Z true m 1 B F U m 0 ; ACE A Z true ( m 1 m 0 × B F U ) [ f + ( 1 f ) B F U ] = m 1 B F U m 0 [ f × B F U + ( 1 f ) ] .

Proof

From the data, we can identify

m 1 = E ( Z A = 1 , U = u ) F 1 ( d u ) = r 1 ( u ) F 1 ( d u ) = E ( Z A = 1 ) ; m 0 = E ( Z A = 0 , U = u ) F 0 ( d u ) = r 0 ( u ) F 0 ( d u ) = E ( Z A = 0 ) .

The counterfactual probabilities are not identifiable:

E ( Z ( 1 ) = 1 A = 0 ) = E ( Z = 1 A = 1 , U = u ) F 0 ( d u ) = r 1 ( u ) F 0 ( d u ) ; E ( Z ( 0 ) = 1 A = 1 ) = E ( Z = 1 A = 0 , U = u ) F 1 ( d u ) = r 0 ( u ) F 1 ( d u ) .

First, by Proposition A5, we have

m 1 E ( Z ( 1 ) = 1 A = 0 ) = r 1 ( u ) F 1 ( d u ) r 1 ( u ) F 0 ( d u ) = r 1 ( u ) F 1 ( d u ) r 0 ( u ) F 0 ( d u ) / r 1 ( u ) F 0 ( d u ) r 0 ( u ) F 0 ( d u ) = MR A Z MR A Z true = CMR A Z B F U .

Thus, we have E ( Z ( 1 = 1 ) A = 0 ) m 1 B F U .

Second, by Proposition A5 again, we have

E ( Z ( 0 ) = 1 A = 1 ) m 0 = r 0 ( u ) F 1 ( d u ) r 0 ( u ) F 0 ( d u ) = CMR A Z + B F U .

Thus, we have E ( Z ( 0 ) = 1 A = 1 ) m 0 B F U .

By definition of ACE and the inequalities derived earlier, we have

ACE A Z + true = m 1 r 0 ( u ) F 1 ( d u ) m 1 m 0 × B F U ; ACE A Z true = r 1 ( u ) F 0 ( d u ) m 0 m 1 B F U m 0 ;

ACE A Z true = f ACE A Z + true + ( 1 f ) ACE A Z true f ( m 1 m 0 B F U ) + ( 1 f ) m 1 B F U m 0 = ( m 1 m 0 × B F U ) [ f + ( 1 f ) B F U ] = m 1 B F U m 0 [ f × B F U + ( 1 f ) ] .

Proposition A7

For nonnegative outcomes with ACE A Z obs < 0 , we have

ACE A Z + true m 1 B F U m 0 ; ACE A Z true m 1 m 0 B F U ; ACE A Z true ( m 1 B F U m 0 ) f + 1 f B F U = m 1 m 0 B F U ( f B F U + 1 f ) .

Proof

Define A ¯ = 1 A . By applying Proposition A6 we have

ACE A ¯ Z + true E ( Z A ¯ = 1 ) E ( Z A ¯ = 0 ) × B F U ; ACE A ¯ Z true E ( Z A ¯ = 1 ) B F U E ( Z A ¯ = 0 ) ; ACE A ¯ Z true ( E ( Z A ¯ = 1 ) E ( Z A ¯ = 0 ) × B F U ) [ f + ( 1 f ) B F U ] = E ( Z A ¯ = 1 ) B F U E ( Z A ¯ = 0 ) [ f × B F U + ( 1 f ) ] .

Because ACE A ¯ Z + true = ACE A Z + true , ACE A ¯ Z true = ACE A Z true , and ACE A ¯ Z true = ACE A Z true , and we also have E ( Z A ¯ = 0 ) = E ( Z A = 1 ) = m 1 and E ( Z A ¯ = 1 ) = E ( Z A = 0 ) = m 0 . Then we have

ACE A Z + true m 1 B F U m 0 ; ACE A Z true m 1 m 0 B F U ; ACE A Z true ( m 1 B F U m 0 ) f + 1 f B F U = m 1 m 0 B F U ( f B F U + 1 f ) .

B.2 Proofs of propositions about the marginal effect

To make the bounding factor hold for all propensity score values, we consider the maximum value of B F U across all values of propensity score e ( X ) , which is defined as B F U = max e ( x ) ( B F U e ( X ) = e ( x ) ) .

Proposition A8

For nonnegative outcomes and ACE A Z obs 0 , we have

ACE A Z + true m 1 m 0 × B F U ; ACE A Z true m 1 B F U m 0 ; ACE A Z true ( m 1 m 0 × B F U ) [ f + ( 1 f ) B F U ] = m 1 B F U m 0 [ f × B F U + ( 1 f ) ] .

For nonnegative outcomes and ACE A Z obs < 0 , we have

ACE A Z + true m 1 B F U m 0 ; ACE A Z true m 1 m 0 B F U ; ACE A Z true ( m 1 B F U m 0 ) f + 1 f B F U = m 1 m 0 B F U ( f B F U + 1 f ) .

Proof

We will start from showing the results for nonnegative outcomes and ACE A Z obs 0 . First, we have

m 1 E ( Z ( 1 ) = 1 A = 0 ) = r 1 ( u ) F 1 ( d u ) r 1 ( u ) F 0 ( d u ) = r 1 ( u ) F 1 ( d u ) r 0 ( u ) F 0 ( d u ) r 1 ( u ) F 0 ( d u ) r 0 ( u ) F 0 ( d u ) = MR A Z MR A Z true = CMR A Z B F U B F U .

Thus, we have E ( Z ( 1 = 1 ) A = 0 ) m 1 B F U m 1 B F U .

Second, we know that E ( Z ( 0 ) = 1 A = 1 ) m 0 = r 0 ( u ) F 1 ( d u ) r 0 ( u ) F 0 ( d u ) = CMR A Z + B F U B F U , then we have E ( Z ( 0 ) = 1 A = 1 ) m 0 B F U m 0 B F U .

By definition of ACE and the inequalities derived earlier, we have

ACE A Z + true = m 1 r 0 ( u ) F 1 ( d u ) m 1 m 0 × B F U ; ACE A Z true = r 1 ( u ) F 0 ( d u ) m 0 m 1 B F U m 0 ; ACE A Z true = f ACE A Z + true + ( 1 f ) ACE A Z true f ( m 1 m 0 B F U ) + ( 1 f ) m 1 B F U m 0 = ( m 1 m 0 × B F U ) [ f + ( 1 f ) B F U ] = m 1 B F U m 0 [ f × B F U + ( 1 f ) ] .

Similarly, we can prove the inequalities hold for nonnegative outcomes and ACE A Z obs < 0 .□

Proposition A9

In the matched sample, we have the following inequality for nonnegative outcomes and ACE A Z obs 0 :

ACE A Z true 1 2 1 + 1 B F U E ( Z A = 1 ) 1 2 ( 1 + B F U ) E ( Z A = 0 ) .

In the matched sample, we have the following inequality for nonnegative outcomes and ACE A Z obs < 0 :

ACE A Z true 1 2 ( 1 + B F U ) E ( Z A = 1 ) 1 2 1 + 1 B F U E ( Z A = 0 ) .

Proof

In the matched sample, we have f = P ( A = 1 e ( X ) = e ( x ) ) = 0.5 . For nonnegative outcomes and ACE A Z obs 0 , we have LHS = ACE A Z true = e ( x ) ACE A Z e ( X ) = e ( x ) true P ( e ( X ) = e ( x ) ) and

RHS = e ( x ) ( m 1 m 0 B F U ) f + 1 f B F U P ( e ( X ) = e ( x ) ) = 1 2 1 2 B F U e ( x ) m 1 P ( e ( X ) = e ( x ) ) + 1 B F U e ( x ) m 1 P ( e ( X ) = e ( x ) ) + 1 2 1 2 B F U e ( x ) M 0 P ( e ( X ) = e ( x ) ) e ( x ) m 0 P ( e ( X ) = e ( x ) ) = 1 2 1 + 1 B F U E ( Z A = 1 ) 1 2 ( 1 + B F U ) E ( Z A = 0 ) .

Thus, we have ACE A Z true 1 2 1 + 1 B F U E ( Z A = 1 ) 1 2 ( 1 + B F U ) E ( Z A = 0 ) .

For ACE A Z obs < 0 , we have LHS = ACE A Z true = e ( x ) ACE A Z e ( X ) = e ( x ) true P ( e ( X ) = e ( x ) ) and

RHS = e ( x ) ( m 1 B F U m 0 ) f + 1 f B F U P ( e ( X ) = e ( x ) ) = e ( x ) ( m 1 B F U m 0 ) 1 2 + 1 2 B F U P ( e ( X ) = e ( x ) ) = 1 2 1 + 1 B F U [ B F U e ( x ) m 1 P ( e ( X ) = e ( x ) ) e ( x ) m 0 P ( e ( X ) = e ( x ) ) ] = 1 2 ( 1 + B F U ) E ( Z A = 1 ) 1 2 1 + 1 B F U E ( Z A = 0 ) .

Thus, we have ACE A Z true 1 2 ( 1 + B F U ) E ( Z A = 1 ) 1 2 1 + 1 B F U E ( Z A = 0 ) .□

C Estimation of G i j ( u , v ) in Section 2

To compute the variance of RMSTs, one difficulty is to estimate the function G i j ( u , v ) based on data. Follow the notations in the study by Murray and Cole [34], we need to transform the function G i j ( u , v ) into the counting process notation system. Suppose we have n matched pairs, then let i , j denote the groups and k = 1 , , n denotes the k th pair. Let U i k be the censoring random variable corresponding to survival time T i k , and the censored survival time is X i k = min ( T i k , U i k ) with censoring status Δ i k = I ( T i k < U i k ) . Then we have the following definitions:

  1. Y i ( u ) = k = 1 n I ( x i k u ) and Y j ( v ) = k = 1 n I ( x j k v ) ;

  2. Y i j ( u , v ) = k = 1 n I ( x i k u , x j k v ) ;

  3. d N i ( u ) = k = 1 n I ( u x i k < u + Δ u , Δ i k = 1 ) , where Δ u 0 ;

  4. d N j ( v ) = k = 1 n I ( v x i k < v + Δ v , Δ i k = 1 ) , where Δ v 0 ;

  5. d N i j ( u , v ) = k = 1 n I ( u x i k < u + Δ u , v x j k < v + Δ v , Δ i k = 1 , Δ j k = 1 ) , where Δ u 0 and Δ v 0 ;

  6. d N i j ( u v ) = k = 1 n I ( u x i k < u + Δ u , x j k v , Δ i k = 1 ) , where Δ u 0 ;

  7. d N j i ( v u ) = k = 1 n I ( v x j k < v + Δ v , x i k u , Δ j k = 1 ) , where Δ v 0 ;

  8. The G ˆ i j ( u , v ) could be estimated by the following formula, and we set Δ u = 0 and Δ v = 0 in real computation. The corresponding R code could be found in our supplementary materials.

    G ˆ i j ( u , v ) = n Y i j ( u , v ) Y i ( u ) Y j ( v ) d N i j ( u , v ) Y i j ( u , v ) d N i j ( u v ) d N j ( v ) Y i j ( u , v ) Y j ( v ) d N j i ( v u ) d N i ( u ) Y i j ( u , v ) Y i ( u ) + d N i ( u ) d N j ( v ) Y i ( u ) Y j ( v ) .

References

[1] Greenland S, Pearl J, Robins JM. Confounding and collapsibility in causal inference. Stat Sci. 1999;14(1):29–46. 10.1214/ss/1009211805. Search in Google Scholar

[2] Martinussen T, Vansteelandt S. On collapsibility and confounding bias in cox and aalen regression models. Lifetime Data Analysis. 2013;19(3):279–96. 10.1007/s10985-013-9242-z. Search in Google Scholar PubMed

[3] Sjooolander A, Dahlqwist E, Zetterqvist J. A note on the noncollapsibility of rate differences and rate ratios. Epidemiology. 2016;27(3):356–9. 10.1097/EDE.0000000000000433. Search in Google Scholar PubMed

[4] Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41–55. 10.1093/biomet/70.1.41. Search in Google Scholar

[5] Rosenbaum PR. Design of observational studies. 2nd ed. Cham: Springer; 2020. 10.1007/978-3-030-46405-9. Search in Google Scholar

[6] Hernán MA. The hazards of hazard ratios. Epidemiology. 2010;21(1):13. 10.1097/EDE.0b013e3181c1ea43. Search in Google Scholar PubMed PubMed Central

[7] Aalen OO, Cook RJ, Røysland K. Does cox analysis of a randomized survival study yield a causal treatment effect? Lifetime Data Analysis. 2015 June;21(4):579–93. 10.1007/s10985-015-9335-y. Search in Google Scholar PubMed

[8] Ni A, Lin Z, Lu B. Stratified restricted mean survival time model for marginal causal effect in observational survival data. Ann Epidemiol. 2021;64:149–54. https://www.sciencedirect.com/science/article/pii/S1047279721003082. 10.1016/j.annepidem.2021.09.016Search in Google Scholar PubMed PubMed Central

[9] Royston P, Parmar MK. Restricted mean survival time: an alternative to the hazard ratio for the design and analysis of randomized trials with a time-to-event outcome. BMC Med Res Methodol. 2013;13(1):1–15. 10.1186/1471-2288-13-152. Search in Google Scholar PubMed PubMed Central

[10] Trinquart L, Jacot J, Conner SC, Porcher R. Comparison of treatment effects measured by the hazard ratio and by the ratio of restricted mean survival times in oncology randomized controlled trials. J Clin Oncol. 2016;34(15):1813–9. 10.1200/JCO.2015.64.2488. Search in Google Scholar PubMed

[11] Karrison T. Restricted mean life with adjustment for covariates. J Am Stat Assoc. 1987;82(400):1169–76. https://www.tandfonline.com/doi/abs/10.1080/01621459.1987.10478555. 10.1080/01621459.1987.10478555Search in Google Scholar

[12] Zucker DM. Restricted mean life with covariates: modification and extension of a useful survival analysis method. J Am Stat Assoc. 1998;93(442):702–9. https://www.tandfonline.com/doi/abs/10.1080/01621459.1998.10473722. 10.1080/01621459.1998.10473722Search in Google Scholar

[13] Andersen PK, Hansen MG, Klein JP. Regression analysis of restricted mean survival time based on pseudo-observations. Lifetime Data Analysis. 2004;10(4):335–50. 10.1007/s10985-004-4771-0. Search in Google Scholar PubMed

[14] Tian L, Zhao L, Wei LJ. Predicting the restricted mean event time with the subjectas baseline covariates in survival analysis. Biostatistics. 2013;15(2):222–33. 10.1093/biostatistics/kxt050. Search in Google Scholar PubMed PubMed Central

[15] Wang X, Schaubel DE. Modeling restricted mean survival time under general censoring mechanisms. Lifetime Data Analysis. 2018;24(1):176–99. 10.1007/s10985-017-9391-6. Search in Google Scholar PubMed PubMed Central

[16] Zhang M, Schaubel DE. Double-Robust semiparametric estimator for differences in restricted mean lifetimes in observational studies. Biometrics. 2012;68(4):999–1009. https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1541-0420.2012.01759.x. 10.1111/j.1541-0420.2012.01759.xSearch in Google Scholar PubMed PubMed Central

[17] Conner SC, Sullivan LM, Benjamin EJ, LaValley MP, Galea S, Trinquart L. Adjusted restricted mean survival times in observational studies. Stat Med. 2019;38(20):3832–60. https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.8206. 10.1002/sim.8206Search in Google Scholar PubMed PubMed Central

[18] Kochanek KD, Murphy SL, Xu J, Arias E. Mortality in the United States, 2013. NCHS Data Brief. 2014;(178):1–8. Search in Google Scholar

[19] Mozaffarian D, Benjamin EJ, Go AS, Arnett DK, Blaha MJ, Cushman M, et al.Heart disease and stroke statistics-2016 update. Circulation. 2016;133(4):e38–360. https://www.ahajournals.org/doi/abs/10.1161/CIR.0000000000000350. 10.1161/CIR.0000000000000350Search in Google Scholar PubMed

[20] Wolf PA, D’Agostino RB, Kannel WB, Bonita R, Belanger AJ. Cigarette smoking as a risk factor for stroke: the framingham study. JAMA. 1988;259(7):1025–9. 10.1001/jama.1988.03720070025028. Search in Google Scholar

[21] Shinton R, Beevers G. Meta-analysis of relation between cigarette smoking and stroke. British Med J. 1989;298(6676):789–94. https://www.bmj.com/content/298/6676/789. 10.1136/bmj.298.6676.789Search in Google Scholar PubMed PubMed Central

[22] Bonita R, Duncan J, Truelsen T, Jackson RT, Beaglehole R. Passive smoking as well as active smoking increases the risk of acute stroke. Tobacco Control. 1999;8(2):156–60. https://tobaccocontrol.bmj.com/content/8/2/156. 10.1136/tc.8.2.156Search in Google Scholar PubMed PubMed Central

[23] Shah RS, Cole JW. Smoking and stroke: the more you smoke the more you stroke. Expert Rev Cardiovasc Ther. 2010;8(7):917–32. 10.1586/erc.10.56. Search in Google Scholar PubMed PubMed Central

[24] Aric investigators. The atherosclerosis risk in communities (ARIC) study: design and objectives. Am J Epidemiol. 1989;129(4):687–702. 10.1093/oxfordjournals.aje.a115184. Search in Google Scholar

[25] Kwon Y, Norby FL, Jensen PN, Agarwal SK, Soliman EZ, Lip GYH, et al. Association of smoking, alcohol, and obesity with cardiovascular death and ischemic stroke in atrial fibrillation: the atherosclerosis risk in communities (ARIC) study and cardiovascular health study (CHS). Plos One. 2016 Jan;11(1):1–13. 10.1371/journal.pone.0147065. Search in Google Scholar PubMed PubMed Central

[26] Ding N, Sang Y, Chen J, Ballew SH, Kalbaugh CA, Salameh MJ, et al. Cigarette smoking, smoking cessation, and long-term risk of 3 major atherosclerotic diseases. J Am College Cardiol. 2019;74(4):498–507. 10.1016/j.jacc.2019.05.049. Search in Google Scholar PubMed PubMed Central

[27] Rubin DB. Estimating Causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol. 1974;66(5):688–701. 10.1037/h0037350. Search in Google Scholar

[28] Imbens GW, Rubin DB. Causal inference for statistics, social, and biomedical sciences: an introduction. New York: Cambridge University Press; 2015. 10.1017/CBO9781139025751. Search in Google Scholar

[29] Fleming TR, Harrington DP. Counting processes and survival analysis. Hoboken, New Jersey: John Wiley & Sons; 2011. 10.1002/9781118150672. Search in Google Scholar

[30] Bang H, Robins JM. Doubly robust estimation in missing data and causal inference models. Biometrics. 2005;61(4):962–73. https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1541-0420.2005.00377.x. 10.1111/j.1541-0420.2005.00377.xSearch in Google Scholar PubMed

[31] McCaffrey DF, Ridgeway G, Morral AR. Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychol Meth. 2004;9(4):403–25. 10.1037/1082-989X.9.4.403. Search in Google Scholar PubMed

[32] Westreich D, Lessler J, Funk MJ. Propensity score estimation: machine learning and classification methods as alternatives to logistic regression. J Clin Epidemiol. 2010;63(8):826–33. 10.1016/j.jclinepi.2009.11.020. Search in Google Scholar PubMed PubMed Central

[33] Hansen BB, Klopfer SO. Optimal full matching and related designs via network flows. J Comput Graph Stat. 2006;15(3):609–27. 10.1198/106186006X137047. Search in Google Scholar

[34] Murray S, Cole B. Variance and sample size calculations in quality-of-life-adjusted survival analysis (Q-TWiST). Biometrics. 2000;56(1):173–82. https://onlinelibrary.wiley.com/doi/abs/10.1111/j.0006-341X.2000.00173.x. 10.1111/j.0006-341X.2000.00173.xSearch in Google Scholar

[35] Hosmer DW, Lemeshow S, May S. Applied survival analysis: regression modeling of time-to-event data. 2nd ed. Hoboken, New Jersey: John Wiley & Sons; 2011. 10.1002/9780470258019. Search in Google Scholar

[36] Bender R, Augustin T, Blettner M. Generating survival times to simulate cox proportional hazards models. Stat Med. 2005;24(11):1713–23. https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.2059. 10.1002/sim.2059Search in Google Scholar PubMed

[37] Ding P, VanderWeele TJ. Sensitivity analysis without assumptions. Epidemiology. 2016;27(3):368–77. 10.1097/EDE.0000000000000457. Search in Google Scholar PubMed PubMed Central

[38] VanderWeele TJ, Ding P. Sensitivity analysis in observational research: introducing the e-value. Ann Internal Med. 2017;167(4):268–74. https://www.acpjournals.org/doi/abs/10.7326/M16-2607. 10.7326/M16-2607Search in Google Scholar PubMed

[39] Kim DH, Uno H, Wei LJ. Restricted mean survival time as a measure to interpret clinical trial results. JAMA Cardiol. 2017;2(11):1179–80. 10.1001/jamacardio.2017.2922Search in Google Scholar PubMed PubMed Central

[40] Tian L, Jin H, Uno H, Lu Y, Huang B, Anderson KM, et al. On the empirical choice of the time window for restricted mean survival time. Biometrics. 2020;76(4):1157–66. 10.1111/biom.13237Search in Google Scholar PubMed PubMed Central

[41] Rubin DB. The use of matched sampling and regression adjustment to remove bias in observational studies. Biometrics. 1973;29(1):185–203. http://www.jstor.org/stable/2529685. 10.1017/CBO9780511810725.008Search in Google Scholar

Received: 2022-05-10
Revised: 2022-11-21
Accepted: 2022-12-31
Published Online: 2023-02-15

© 2023 the author(s), published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 10.6.2024 from https://www.degruyter.com/document/doi/10.1515/jci-2022-0035/html
Scroll to top button