Skip to main content
Log in

Inferential Pluralism in Causal Reasoning from Randomized Experiments

  • Regular Article
  • Published:
Acta Biotheoretica Aims and scope Submit manuscript

Abstract

Causal pluralism can be defended not only in respect to causal concepts and methodological guidelines, but also at the finer-grained level of causal inference from a particular source of evidence for causation. An argument for this last variety of pluralism is made based on an analysis of causal inference from randomized experiments (RCTs). Here, the causal interpretation of a statistically significant association can be established via multiple paths of reasoning, each relying on different assumptions and providing distinct elements of information in favour of a causal interpretation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. The proposed reconstruction concerns explanatory, non-adaptive RCTs conducted under the assumption that no prior knowledge relevant to the causal claim under investigation is available. This type of experiment is common in clinical and preclinical research, psychology, education, agriculture and ecology. Since the experimental design behind traditional RCTs is meant to accommodate statistical inference under a frequentist approach, the frequentist statistics standardly employed in the analysis of experimental results is assumed in this paper.

  2. The above inference, known as the randomization inference, does not rely on theoretical approximations based on assumptions about the shape of the sampling distribution. In practice, researchers are often interested in demonstrating that there is an average between-group exposure association, as well as estimating the size of that association. Thus, “the first step in assessing the intervention’s effect involves testing for a statistical association between intervention group membership (intervention vs. control) and the identified outcome […]. This is accomplished by using an appropriate inferential statistical procedure (e.g., an independent-samples t test) coupled with an effect size estimate (e.g., Cohen’s d), to provide pertinent information regarding both the statistical significance and strength (i.e., the amount of benefit) of the intervention–outcome association” (DeLucia and Pitts 2010, p. 631). Unlike the randomization inference, t tests presuppose that the outcomes are normally distributed or that the population from which individuals are drawn is large enough that the test statistics follow a posited sampling distribution (Gerber and Green 2012, p. 65).

  3. “When a variable systematically varies with the independent variable, the confounding variable provides an explanation other than the independent variable for changes in the dependent variable” (Kovera 2010b, p. 220). For instance, in a linear regression model, the “[c]orrelation of the random effects εj [the error term in µj = α +  + εj] with the treatments xj [allocation] leads to bias in estimating β [the magnitude of the contribution of x to µ]. This bias may be attributed to or interpreted as confounding for β in the regression analysis. Confounders are now covariates that ‘explain’ the correlation between εj and xj. In particular, confounders reduce the correlation of xj and εj when entered in the model and so reduce the bias in estimating β” (Greenland et al. 1999, p. 34). Note that confounding is defined here in strictly statistical terms with no explicit reference to causation. In practice, researchers assess the potential for confounding by checking if allocation alone (before exposure to the tested treatment) covaries with potential confounders; in other words, researchers check if usual suspects, such as gender and age, are unbalanced between test and control trial participants. If unbalances exist, then it is possible that the allocation intervention sets both the value of the independent variable X to x and that of covariate W to w, such that the observed differences in outcome can be explained not only by differences in X, but also by differences in W.

  4. Randomization cannot and is not meant to guarantee that variables will not happen to be associated with groups by chance. That this is the case is most vividly illustrated by a trial in which only two individuals are randomly assigned to test and control conditions: in this case, individual characteristics and allocation condition are undistinguishable. It can certainly be argued that random assignment “equates groups on expectation at pretest,” that is, “in the long run over many randomized experiments” (Shadish et al. 2002, p. 250). However, this “does not mean that random assignment equates units on observed pretest scores” (Shadish et al. 2002, p. 250). For any given round of randomization, gains in precision are driven by group size. If “the original sample is large enough then the two groups should be more or less identical in the important characteristics […]. The two groups will differ only by chance” (Bowers 2014, p. 120). Since random error decreases as sample size increases, it follows that as “the sample size grows, observed and unobserved confounders are balanced across treatment and control groups with arbitrarily high probability” (Sekhon 2008, p. 273). Nevertheless, even in the case of large groups, a single round of randomization cannot guarantee perfect balancing, but only a high probability that known and unknown confounders are balanced across treatment and control group.

  5. Greenland et al. (1999, p. 29) take this to be the oldest usage of confounding, namely as “a type of bias in estimating causal effects. This bias is sometimes informally described as a mixing of effects of extraneous factors (called confounders) with the effect of interest.” According to this usage a “variable cannot be a confounder (in this sense) unless (1) it can causally affect the outcome parameter μ within treatment groups, and (2) it is distributed differently among the compared populations […]. The two necessary conditions (1) and (2) are sometimes offered together as a definition of a confounder” (Greenland et al. (1999, p. 33–34). Greenland et al. (1999, p. 33) further point out that confounding occurs only if different distributions (unbalances) result in net effect differences in the outcome of interest: “a covariate difference […] is a necessary but not sufficient condition for confounding, because the effects of the various covariate differences may balance out in such a way that no confounding is present.”.

  6. On the other hand, if comparability is understood as a homogenization removing all unbalances, including those due to random error, then something is amiss. The fact that a chance explanation is considered and needs to be ruled out by a statistical test indicates that the causal inference deployed in the context of an RCT works explicitly on the expectation that there is a non-negligible risk of generating incomparable groups and that, therefore, comparability cannot be guaranteed. Senn points out that not only “[i]t is not necessary for the groups to be balanced,” but “the probability calculation applied to a clinical trial automatically makes an allowance for the fact that groups will almost certainly be unbalanced, and if one knew that they were balanced, then the calculation that is usually performed would not be correct” (Senn 2013, p. 1442). Conversely, prior assurance of comparability removes the need to consider the possibility of a chance explanation and, therefore, the need to perform a statistical test: “If treatment groups could be equated before treatment, and if they were different after treatment, then pretest selection differences could not be a cause of observed posttest differences” (Shadish et al. 2002, p. 249). Likewise, if we knew that all patients and their circumstances are identical, the recovery of every single patient in the test group could only be attributed to the efficacy of the treatment (Hill 1955, Ch. VIII). Sekhon (2008) further argues that, under these circumstances, causal inference collapses into a version of Mill’s method of difference. Given a zero probability of generating incomparable groups, if differences in outcomes are observed, these differences must have been caused by something other than the process by which the groups were generated (random allocation) irrespective of the magnitude of the differences in outcomes and the size of the groups. Presumably, this is the strategy favoured in most controlled experiments in basic science, where it is common practice to engineer homogenous populations of comparable individuals, for instance, by generating immortalized cell-lines consisting of clones or systematically inbreeding mice in order to produce isogenic/homozygous stains (Baetu 2020; Müller-Wille 2007).

  7. Some authors prefer the term ‘internal validity’ in order to emphasize that the results of the test concern solely the particular experimental setup employed in the study (e.g., trial participants, the specific version of the intervention employed, the circumstances in which the intervention was conducted, the specific outcomes or endpoints assessed in the experiment, etc.) (Manski 2007; Shadish et al. 2002).

  8. In principle, the prospective design of experiments may also help rule out non-causal explanations of systematic association, such as supervenience, coreference and nomic dependence scenarios, in which the independent and dependent variables covary simultaneously.

  9. A common example is gene duplication. If, upon knocking out a gene, no differences in phenotype are observed, background knowledge dictates that this result is more likely to be due to the compensating effect of homologous sequences than to the fact that the gene plays no causal role vis-à-vis phenotype.

  10. Observational data can discriminate between two classes of Markov-equivalent structures, with chains, reverse chains and forks, in one class, and colliders (A→B←C), in the other, thus allowing for causal inference from observational data assuming minimality, Markov and positivity (Pearl 2000; Spirtes et al. 1993). Despite its importance for causal discovery algorithms, this result is not of significant utility for causal inference from experiments which probe solely the input and output of a system, thus generating data about the conditional probability of the output variable given the probability of the input variable.

References

  • Altman DG, Bland JM (1999) Treatment allocation in controlled trials: why randomise. BMJ 318:1209

    Article  Google Scholar 

  • Baetu TM (2020) Causal inference in biomedical research. Biol Philos 35:43

    Article  Google Scholar 

  • Baumgartner M (2009) Interventionist causal exclusion and non-reductive physicalism. Int Stud Philos Sci 23(2):161–178

    Article  Google Scholar 

  • Ben-Menahem Y (2018) Causation in science. Princeton University Press, Princeton

    Book  Google Scholar 

  • Bowers D (2014) Medical statistics from scratch: an introduction for health professionals. Wiley, Hoboken, NJ

    Google Scholar 

  • Broadbent A (2013) Philosophy of epidemiology. Palgrave Macmillan, Houndmills

    Book  Google Scholar 

  • Cartwright N (1989) Nature’s capacities and their measurement. Oxford University Press, Oxford

    Google Scholar 

  • Cartwright N (2010) What are randomised controlled trials food for? Philos Stud 147:59–70

    Article  Google Scholar 

  • Cox DR, Wermuth N (2004) Causality: a statistical view. Int Stat Rev 72(3):285–305

    Article  Google Scholar 

  • DeLucia C, Pitts SC (2010) Intervention. In: Salkind NJ (ed) Encyclopedia of research design. SAGE, Thousand Oaks

    Google Scholar 

  • Donner A, Klar N (2004) Pitfalls of and controversies in cluster randomization trials. Am J Public Health 94(3):416–422

    Article  Google Scholar 

  • Eberhardt F, Glymour C et al (2012) On the number of experiments sufficient and in the worst case necessary to identify all Causal relations among N variables. In UAI'05 Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (ed). Arlington, AUAI Press, pp 178–184

  • Evidence Based Medicine Working Group (1992) Evidence-based medicine. A new approach to teaching the practice of medicine. J Am Med Assoc 268:2420–2425

    Article  Google Scholar 

  • Fisher RA (1947) The design ol experiments. Oliver and Boyd, Edinburgh

    Google Scholar 

  • Franklin A (2007) The role of experiments in the natural sciences: examples from physics and biology. In: Kuipers T (ed) General philosophy of science: focal issues. Elsevier, Amsterdam

    Google Scholar 

  • Fuller J (2019) The confounding question of confounding causes in randomized trials. Br J Philos Sci 70:901–926

    Article  Google Scholar 

  • Gerber AS, Green DP (2012) Field experiments: design, analysis, and interpretation. Norton, New York, NY

    Google Scholar 

  • Godfrey-Smith P (2009) Causal pluralism. In: Beebee H, Hitchcock C, Menzies P (eds) Oxford handbook of causation. Oxford University Press, New York

    Google Scholar 

  • Gori GB (1989) Epidemiology and the concept of causation in multifactorial diseases. Regul Toxicol Pharmacol 9(3):263–272

    Article  Google Scholar 

  • Greenland S, Robins JM et al (1999) Confounding and collapsibility in causal inference. Stat Sci 14(1):29–46

    Article  Google Scholar 

  • Guyatt G, Djulbegovic B (2019) Evidence-based medicine and the theory of knowledge. In: Guyatt G, Rennie D, Meade MO, Cook DJ (eds) Users’ Guides to the medical literature: a manual for evidence-based clinical practice. JAMA/McGraw-Hill Education, New York

    Google Scholar 

  • Hall N (2004) Two concepts of causation. In: Collins J, Hall N, Paul L (eds) Causation and counterfactuals. MIT Press, Cambridge, pp 225–276

    Google Scholar 

  • Hernán MA, Robins JM (2020) Causal inference: what if. Chapman & Hall/CRC, Boca Raton

    Google Scholar 

  • Hill AB (1952) The clinical trial. N Engl J Med 247:113–119

    Article  Google Scholar 

  • Hill AB (1955) Principles of medical statistics, 6th edn. Oxford University Press, New York

    Google Scholar 

  • Howick J (2011) The philosophy of evidence-based medicine. BMJ Books, Oxford

    Book  Google Scholar 

  • ISIS-2 (Second International Study of Infarct Survival) Collaborative Group (1988) Randomised trial of intravenous streptokinase, oral aspirin, both, or neither among 17’187 cases of suspected acute myocardial infarction: ISIS-2. Lancet 2:349–360

    Google Scholar 

  • Jaynes ET (2003) Probability theory: the logic of science. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Koepke D, Flay R (1989) Levels of analysis. In: Braverman MT (ed) New directions for program evaluation: evaluating health promotion programs. Jossey-Bass, San Francisco

    Google Scholar 

  • Kovera MB (2010a) Bias. In: Salkind NJ (ed) Encyclopedia of Research Design. SAGE, Thousand Oaks

    Google Scholar 

  • Kovera MB (2010b) Confounding. In: Salkind NJ (ed) Encyclopedia of research design. SAGE, Thousand Oaks

    Google Scholar 

  • Leighton JP (2010) Internal validity. In: Salkind NJ (ed) Encyclopedia of research design. SAGE, Thousand Oaks, pp 619–22

    Google Scholar 

  • Lewis DK (1979) Counterfactual dependence and time’s arrow. Noûs 13:455–476

    Article  Google Scholar 

  • Manski C (2007) Identification for prediction and decision. Harvard University Press, Cambridge

    Google Scholar 

  • Merlo J, Lynch K (2010) Association, Measures of. In: Salkind NJ (ed) Encyclopedia of Research Design. SAGE, Thousand Oaks

    Google Scholar 

  • Miettinen O (1974) Confounding and effect-modification. Am J Epidemiol 100(5):350–353

    Article  Google Scholar 

  • Mill JS (1843) A system of logic, ratiocinative and inductive. John W. Parker, London

    Google Scholar 

  • Müller-Wille S (2007) Hybrids, pure cultures, and pure lines: from nineteenth-century biology to twentieth-century genetics. Stud Hist Philos Biol Biomed Sci 38(4):796–806

    Article  Google Scholar 

  • Neyman J, Pearson ES (1928) On the use and interpretation of certain test criteria for purposes of statistical inference part I. Biometrika 20A(1–2):175–240

    Google Scholar 

  • Papineau D (1994) The virtues of randomization. Br J Philos Sci 45:437–450

    Article  Google Scholar 

  • Parkkinen V-P, Wallmann C et al (2018) Evaluating evidence of mechanisms in medicine: principles and procedures. Springer, Cham

    Book  Google Scholar 

  • Pearl J (2000) Causality. Models, Reasoning, and Inference. Cambridge University Press, Cambridge

    Google Scholar 

  • Pearl J, Glymour M et al (2016) Causal inference in statistics: a primer. Wiley & Sons, Chichester

    Google Scholar 

  • Psillos S (2004) A glimpse of the secret connexion: harmonizing mechanisms with counterfactuals. Perspect Sci 12:288–391

    Article  Google Scholar 

  • Rosenbaum PR (1995) Observational studies. Springer, New York

    Book  Google Scholar 

  • Rothman KJ (1974) Synergy and antagonism in cause-effect relationships. Am J Epidemiol 99(6):385–388

    Article  Google Scholar 

  • Rubin D (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66(5):688–701

    Article  Google Scholar 

  • Russo F, Williamson J (2007) Interpreting causality in the health sciences. Int Stud Philos Sci 21(2):157–170

    Article  Google Scholar 

  • Sekhon JS (2008) The Neyman-Rubin model of causal inference and estimation via matching methods. In: Box-Steffensmeier JM, Brady HE, Collier D (eds) The Oxford Handbook of Political Methodology. Oxford University Press, New York, pp 271–299

    Google Scholar 

  • Senn S (2013) Seven myths of randomisation in clinical trials. Stat Med 32:1439–1450

    Article  Google Scholar 

  • Shadish WR, Cook TD et al (2002) Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin, Boston

    Google Scholar 

  • Sharabiani M, Aylin P et al (2012) Systematic review of comorbidity indices for administrative data. Med Care 50(12):1109–1118

    Article  Google Scholar 

  • Skyrms B (1984) EPR: lessons for metaphysics. Midwest Stud Philos 9:245–255

    Article  Google Scholar 

  • Spirtes P, Glymour C et al (1993) Causation, prediction and search. Springer-Verlag, New York

    Book  Google Scholar 

  • Stigler SM (1986) The history of statistics. Harvard University Press, Cambridge, MA

    Google Scholar 

  • Verma T, Pearl J (1990) Equivalence and synthesis of causal models. In: Proceedings of the sixth conference on uncertainty in artificial intelligence. Cambridge, MA, pp 220–227

  • Weber M (2009) The crux of crucial experiments: Duhem’s problems and inference to the best explanation. Br J Philos Sci 60:19–49

    Article  Google Scholar 

  • Williamson J (2019) Establishing causal claims in medicine. Int Stud Philos Sci 32(1):33–61

    Article  Google Scholar 

  • Winch RF, Campbell DT (1969) Proof? No. evidence? Yes. The significance of tests of significance. Am Sociol 4(2):140–143

    Google Scholar 

  • Woodward J (2003) Making things happen: a theory of causal explanation. Oxford University Press, Oxford

    Google Scholar 

  • Worrall J (2002) What evidence in evidence-based medicine? Philos Sci 69:S316–S330

    Article  Google Scholar 

  • Worrall J (2007a) Evidence in medicine and evidence-based medicine. Philos Compass 2(6):981–1022

    Article  Google Scholar 

  • Worrall J (2007b) Why there’s no cause to randomize. Br J Philos Sci 58:451–488

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by SSHRC Grant # 430-2020-0654.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tudor M. Baetu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Baetu, T.M. Inferential Pluralism in Causal Reasoning from Randomized Experiments. Acta Biotheor 70, 22 (2022). https://doi.org/10.1007/s10441-022-09446-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10441-022-09446-2

Keywords

Navigation