Inferential Pluralism in Causal Reasoning from Randomized Experiments

Baetu, Tudor M.

doi:10.1007/s10441-022-09446-2

Inferential Pluralism in Causal Reasoning from Randomized Experiments

Regular Article
Published: 13 August 2022

Volume 70, article number 22, (2022)
Cite this article

Acta Biotheoretica Aims and scope Submit manuscript

Tudor M. Baetu ORCID: orcid.org/0000-0002-5544-1773¹

250 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Causal pluralism can be defended not only in respect to causal concepts and methodological guidelines, but also at the finer-grained level of causal inference from a particular source of evidence for causation. An argument for this last variety of pluralism is made based on an analysis of causal inference from randomized experiments (RCTs). Here, the causal interpretation of a statistically significant association can be established via multiple paths of reasoning, each relying on different assumptions and providing distinct elements of information in favour of a causal interpretation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Causal Inference in Randomized Experiments

Design and Analysis of Experiments

Heterogeneity and Causality

Notes

The proposed reconstruction concerns explanatory, non-adaptive RCTs conducted under the assumption that no prior knowledge relevant to the causal claim under investigation is available. This type of experiment is common in clinical and preclinical research, psychology, education, agriculture and ecology. Since the experimental design behind traditional RCTs is meant to accommodate statistical inference under a frequentist approach, the frequentist statistics standardly employed in the analysis of experimental results is assumed in this paper.
The above inference, known as the randomization inference, does not rely on theoretical approximations based on assumptions about the shape of the sampling distribution. In practice, researchers are often interested in demonstrating that there is an average between-group exposure association, as well as estimating the size of that association. Thus, “the first step in assessing the intervention’s effect involves testing for a statistical association between intervention group membership (intervention vs. control) and the identified outcome […]. This is accomplished by using an appropriate inferential statistical procedure (e.g., an independent-samples t test) coupled with an effect size estimate (e.g., Cohen’s d), to provide pertinent information regarding both the statistical significance and strength (i.e., the amount of benefit) of the intervention–outcome association” (DeLucia and Pitts 2010, p. 631). Unlike the randomization inference, t tests presuppose that the outcomes are normally distributed or that the population from which individuals are drawn is large enough that the test statistics follow a posited sampling distribution (Gerber and Green 2012, p. 65).
“When a variable systematically varies with the independent variable, the confounding variable provides an explanation other than the independent variable for changes in the dependent variable” (Kovera 2010b, p. 220). For instance, in a linear regression model, the “[c]orrelation of the random effects ε_j [the error term in µ_j = α + xβ + ε_j] with the treatments x_j [allocation] leads to bias in estimating β [the magnitude of the contribution of x to µ]. This bias may be attributed to or interpreted as confounding for β in the regression analysis. Confounders are now covariates that ‘explain’ the correlation between ε_j and x_j. In particular, confounders reduce the correlation of x_j and ε_j when entered in the model and so reduce the bias in estimating β” (Greenland et al. 1999, p. 34). Note that confounding is defined here in strictly statistical terms with no explicit reference to causation. In practice, researchers assess the potential for confounding by checking if allocation alone (before exposure to the tested treatment) covaries with potential confounders; in other words, researchers check if usual suspects, such as gender and age, are unbalanced between test and control trial participants. If unbalances exist, then it is possible that the allocation intervention sets both the value of the independent variable X to x and that of covariate W to w, such that the observed differences in outcome can be explained not only by differences in X, but also by differences in W.
Randomization cannot and is not meant to guarantee that variables will not happen to be associated with groups by chance. That this is the case is most vividly illustrated by a trial in which only two individuals are randomly assigned to test and control conditions: in this case, individual characteristics and allocation condition are undistinguishable. It can certainly be argued that random assignment “equates groups on expectation at pretest,” that is, “in the long run over many randomized experiments” (Shadish et al. 2002, p. 250). However, this “does not mean that random assignment equates units on observed pretest scores” (Shadish et al. 2002, p. 250). For any given round of randomization, gains in precision are driven by group size. If “the original sample is large enough then the two groups should be more or less identical in the important characteristics […]. The two groups will differ only by chance” (Bowers 2014, p. 120). Since random error decreases as sample size increases, it follows that as “the sample size grows, observed and unobserved confounders are balanced across treatment and control groups with arbitrarily high probability” (Sekhon 2008, p. 273). Nevertheless, even in the case of large groups, a single round of randomization cannot guarantee perfect balancing, but only a high probability that known and unknown confounders are balanced across treatment and control group.
Greenland et al. (1999, p. 29) take this to be the oldest usage of confounding, namely as “a type of bias in estimating causal effects. This bias is sometimes informally described as a mixing of effects of extraneous factors (called confounders) with the effect of interest.” According to this usage a “variable cannot be a confounder (in this sense) unless (1) it can causally affect the outcome parameter μ within treatment groups, and (2) it is distributed differently among the compared populations […]. The two necessary conditions (1) and (2) are sometimes offered together as a definition of a confounder” (Greenland et al. (1999, p. 33–34). Greenland et al. (1999, p. 33) further point out that confounding occurs only if different distributions (unbalances) result in net effect differences in the outcome of interest: “a covariate difference […] is a necessary but not sufficient condition for confounding, because the effects of the various covariate differences may balance out in such a way that no confounding is present.”.
On the other hand, if comparability is understood as a homogenization removing all unbalances, including those due to random error, then something is amiss. The fact that a chance explanation is considered and needs to be ruled out by a statistical test indicates that the causal inference deployed in the context of an RCT works explicitly on the expectation that there is a non-negligible risk of generating incomparable groups and that, therefore, comparability cannot be guaranteed. Senn points out that not only “[i]t is not necessary for the groups to be balanced,” but “the probability calculation applied to a clinical trial automatically makes an allowance for the fact that groups will almost certainly be unbalanced, and if one knew that they were balanced, then the calculation that is usually performed would not be correct” (Senn 2013, p. 1442). Conversely, prior assurance of comparability removes the need to consider the possibility of a chance explanation and, therefore, the need to perform a statistical test: “If treatment groups could be equated before treatment, and if they were different after treatment, then pretest selection differences could not be a cause of observed posttest differences” (Shadish et al. 2002, p. 249). Likewise, if we knew that all patients and their circumstances are identical, the recovery of every single patient in the test group could only be attributed to the efficacy of the treatment (Hill 1955, Ch. VIII). Sekhon (2008) further argues that, under these circumstances, causal inference collapses into a version of Mill’s method of difference. Given a zero probability of generating incomparable groups, if differences in outcomes are observed, these differences must have been caused by something other than the process by which the groups were generated (random allocation) irrespective of the magnitude of the differences in outcomes and the size of the groups. Presumably, this is the strategy favoured in most controlled experiments in basic science, where it is common practice to engineer homogenous populations of comparable individuals, for instance, by generating immortalized cell-lines consisting of clones or systematically inbreeding mice in order to produce isogenic/homozygous stains (Baetu 2020; Müller-Wille 2007).
Some authors prefer the term ‘internal validity’ in order to emphasize that the results of the test concern solely the particular experimental setup employed in the study (e.g., trial participants, the specific version of the intervention employed, the circumstances in which the intervention was conducted, the specific outcomes or endpoints assessed in the experiment, etc.) (Manski 2007; Shadish et al. 2002).
In principle, the prospective design of experiments may also help rule out non-causal explanations of systematic association, such as supervenience, coreference and nomic dependence scenarios, in which the independent and dependent variables covary simultaneously.
A common example is gene duplication. If, upon knocking out a gene, no differences in phenotype are observed, background knowledge dictates that this result is more likely to be due to the compensating effect of homologous sequences than to the fact that the gene plays no causal role vis-à-vis phenotype.
Observational data can discriminate between two classes of Markov-equivalent structures, with chains, reverse chains and forks, in one class, and colliders (A→B←C), in the other, thus allowing for causal inference from observational data assuming minimality, Markov and positivity (Pearl 2000; Spirtes et al. 1993). Despite its importance for causal discovery algorithms, this result is not of significant utility for causal inference from experiments which probe solely the input and output of a system, thus generating data about the conditional probability of the output variable given the probability of the input variable.

References

Altman DG, Bland JM (1999) Treatment allocation in controlled trials: why randomise. BMJ 318:1209
Article Google Scholar
Baetu TM (2020) Causal inference in biomedical research. Biol Philos 35:43
Article Google Scholar
Baumgartner M (2009) Interventionist causal exclusion and non-reductive physicalism. Int Stud Philos Sci 23(2):161–178
Article Google Scholar
Ben-Menahem Y (2018) Causation in science. Princeton University Press, Princeton
Book Google Scholar
Bowers D (2014) Medical statistics from scratch: an introduction for health professionals. Wiley, Hoboken, NJ
Google Scholar
Broadbent A (2013) Philosophy of epidemiology. Palgrave Macmillan, Houndmills
Book Google Scholar
Cartwright N (1989) Nature’s capacities and their measurement. Oxford University Press, Oxford
Google Scholar
Cartwright N (2010) What are randomised controlled trials food for? Philos Stud 147:59–70
Article Google Scholar
Cox DR, Wermuth N (2004) Causality: a statistical view. Int Stat Rev 72(3):285–305
Article Google Scholar
DeLucia C, Pitts SC (2010) Intervention. In: Salkind NJ (ed) Encyclopedia of research design. SAGE, Thousand Oaks
Google Scholar
Donner A, Klar N (2004) Pitfalls of and controversies in cluster randomization trials. Am J Public Health 94(3):416–422
Article Google Scholar
Eberhardt F, Glymour C et al (2012) On the number of experiments sufficient and in the worst case necessary to identify all Causal relations among N variables. In UAI'05 Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (ed). Arlington, AUAI Press, pp 178–184
Evidence Based Medicine Working Group (1992) Evidence-based medicine. A new approach to teaching the practice of medicine. J Am Med Assoc 268:2420–2425
Article Google Scholar
Fisher RA (1947) The design ol experiments. Oliver and Boyd, Edinburgh
Google Scholar
Franklin A (2007) The role of experiments in the natural sciences: examples from physics and biology. In: Kuipers T (ed) General philosophy of science: focal issues. Elsevier, Amsterdam
Google Scholar
Fuller J (2019) The confounding question of confounding causes in randomized trials. Br J Philos Sci 70:901–926
Article Google Scholar
Gerber AS, Green DP (2012) Field experiments: design, analysis, and interpretation. Norton, New York, NY
Google Scholar
Godfrey-Smith P (2009) Causal pluralism. In: Beebee H, Hitchcock C, Menzies P (eds) Oxford handbook of causation. Oxford University Press, New York
Google Scholar
Gori GB (1989) Epidemiology and the concept of causation in multifactorial diseases. Regul Toxicol Pharmacol 9(3):263–272
Article Google Scholar
Greenland S, Robins JM et al (1999) Confounding and collapsibility in causal inference. Stat Sci 14(1):29–46
Article Google Scholar
Guyatt G, Djulbegovic B (2019) Evidence-based medicine and the theory of knowledge. In: Guyatt G, Rennie D, Meade MO, Cook DJ (eds) Users’ Guides to the medical literature: a manual for evidence-based clinical practice. JAMA/McGraw-Hill Education, New York
Google Scholar
Hall N (2004) Two concepts of causation. In: Collins J, Hall N, Paul L (eds) Causation and counterfactuals. MIT Press, Cambridge, pp 225–276
Google Scholar
Hernán MA, Robins JM (2020) Causal inference: what if. Chapman & Hall/CRC, Boca Raton
Google Scholar
Hill AB (1952) The clinical trial. N Engl J Med 247:113–119
Article Google Scholar
Hill AB (1955) Principles of medical statistics, 6th edn. Oxford University Press, New York
Google Scholar
Howick J (2011) The philosophy of evidence-based medicine. BMJ Books, Oxford
Book Google Scholar
ISIS-2 (Second International Study of Infarct Survival) Collaborative Group (1988) Randomised trial of intravenous streptokinase, oral aspirin, both, or neither among 17’187 cases of suspected acute myocardial infarction: ISIS-2. Lancet 2:349–360
Google Scholar
Jaynes ET (2003) Probability theory: the logic of science. Cambridge University Press, Cambridge
Book Google Scholar
Koepke D, Flay R (1989) Levels of analysis. In: Braverman MT (ed) New directions for program evaluation: evaluating health promotion programs. Jossey-Bass, San Francisco
Google Scholar
Kovera MB (2010a) Bias. In: Salkind NJ (ed) Encyclopedia of Research Design. SAGE, Thousand Oaks
Google Scholar
Kovera MB (2010b) Confounding. In: Salkind NJ (ed) Encyclopedia of research design. SAGE, Thousand Oaks
Google Scholar
Leighton JP (2010) Internal validity. In: Salkind NJ (ed) Encyclopedia of research design. SAGE, Thousand Oaks, pp 619–22
Google Scholar
Lewis DK (1979) Counterfactual dependence and time’s arrow. Noûs 13:455–476
Article Google Scholar
Manski C (2007) Identification for prediction and decision. Harvard University Press, Cambridge
Google Scholar
Merlo J, Lynch K (2010) Association, Measures of. In: Salkind NJ (ed) Encyclopedia of Research Design. SAGE, Thousand Oaks
Google Scholar
Miettinen O (1974) Confounding and effect-modification. Am J Epidemiol 100(5):350–353
Article Google Scholar
Mill JS (1843) A system of logic, ratiocinative and inductive. John W. Parker, London
Google Scholar
Müller-Wille S (2007) Hybrids, pure cultures, and pure lines: from nineteenth-century biology to twentieth-century genetics. Stud Hist Philos Biol Biomed Sci 38(4):796–806
Article Google Scholar
Neyman J, Pearson ES (1928) On the use and interpretation of certain test criteria for purposes of statistical inference part I. Biometrika 20A(1–2):175–240
Google Scholar
Papineau D (1994) The virtues of randomization. Br J Philos Sci 45:437–450
Article Google Scholar
Parkkinen V-P, Wallmann C et al (2018) Evaluating evidence of mechanisms in medicine: principles and procedures. Springer, Cham
Book Google Scholar
Pearl J (2000) Causality. Models, Reasoning, and Inference. Cambridge University Press, Cambridge
Google Scholar
Pearl J, Glymour M et al (2016) Causal inference in statistics: a primer. Wiley & Sons, Chichester
Google Scholar
Psillos S (2004) A glimpse of the secret connexion: harmonizing mechanisms with counterfactuals. Perspect Sci 12:288–391
Article Google Scholar
Rosenbaum PR (1995) Observational studies. Springer, New York
Book Google Scholar
Rothman KJ (1974) Synergy and antagonism in cause-effect relationships. Am J Epidemiol 99(6):385–388
Article Google Scholar
Rubin D (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66(5):688–701
Article Google Scholar
Russo F, Williamson J (2007) Interpreting causality in the health sciences. Int Stud Philos Sci 21(2):157–170
Article Google Scholar
Sekhon JS (2008) The Neyman-Rubin model of causal inference and estimation via matching methods. In: Box-Steffensmeier JM, Brady HE, Collier D (eds) The Oxford Handbook of Political Methodology. Oxford University Press, New York, pp 271–299
Google Scholar
Senn S (2013) Seven myths of randomisation in clinical trials. Stat Med 32:1439–1450
Article Google Scholar
Shadish WR, Cook TD et al (2002) Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin, Boston
Google Scholar
Sharabiani M, Aylin P et al (2012) Systematic review of comorbidity indices for administrative data. Med Care 50(12):1109–1118
Article Google Scholar
Skyrms B (1984) EPR: lessons for metaphysics. Midwest Stud Philos 9:245–255
Article Google Scholar
Spirtes P, Glymour C et al (1993) Causation, prediction and search. Springer-Verlag, New York
Book Google Scholar
Stigler SM (1986) The history of statistics. Harvard University Press, Cambridge, MA
Google Scholar
Verma T, Pearl J (1990) Equivalence and synthesis of causal models. In: Proceedings of the sixth conference on uncertainty in artificial intelligence. Cambridge, MA, pp 220–227
Weber M (2009) The crux of crucial experiments: Duhem’s problems and inference to the best explanation. Br J Philos Sci 60:19–49
Article Google Scholar
Williamson J (2019) Establishing causal claims in medicine. Int Stud Philos Sci 32(1):33–61
Article Google Scholar
Winch RF, Campbell DT (1969) Proof? No. evidence? Yes. The significance of tests of significance. Am Sociol 4(2):140–143
Google Scholar
Woodward J (2003) Making things happen: a theory of causal explanation. Oxford University Press, Oxford
Google Scholar
Worrall J (2002) What evidence in evidence-based medicine? Philos Sci 69:S316–S330
Article Google Scholar
Worrall J (2007a) Evidence in medicine and evidence-based medicine. Philos Compass 2(6):981–1022
Article Google Scholar
Worrall J (2007b) Why there’s no cause to randomize. Br J Philos Sci 58:451–488
Article Google Scholar

Download references

Acknowledgements

This research was supported by SSHRC Grant # 430-2020-0654.

Author information

Authors and Affiliations

Universite du Québec a Trois-Rivières, Trois-Rivières, Canada
Tudor M. Baetu

Authors

Tudor M. Baetu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tudor M. Baetu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Baetu, T.M. Inferential Pluralism in Causal Reasoning from Randomized Experiments. Acta Biotheor 70, 22 (2022). https://doi.org/10.1007/s10441-022-09446-2

Download citation

Received: 09 September 2021
Accepted: 20 June 2022
Published: 13 August 2022
DOI: https://doi.org/10.1007/s10441-022-09446-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Inferential Pluralism in Causal Reasoning from Randomized Experiments

Abstract

Access this article

Similar content being viewed by others

Causal Inference in Randomized Experiments

Design and Analysis of Experiments

Heterogeneity and Causality

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Inferential Pluralism in Causal Reasoning from Randomized Experiments

Abstract

Access this article

Similar content being viewed by others

Causal Inference in Randomized Experiments

Design and Analysis of Experiments

Heterogeneity and Causality

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation