1 Introduction

Are we in a position to fully automate causal reasoning? Given recent groundbreaking advances in causal epistemology, one might be tempted to reply in the affirmative. In this paper, it will be demonstrated how, notwithstanding a number of significant philosophical and computational developments, one still has good skeptical grounds for resisting any conclusions in favour of the automation of causal reasoning. Our critical examination of the central opening question (hereafter: the automation question) will rely on a delineation of these philosophical and computational advances in causal epistemology, a careful articulation of the automation question, Meno’s Paradox and Polanyi’s treatment of it, and an application of Polanyi’s insights about the nature of tacit knowledge to the automation question.

1.1 Main discussion: philosophical advances in causal epistemology

Before we commence our critical examination of the automation question, an acknowledgment of the state-of-the-art in causal epistemology will first be in order.Footnote 1 After all, one could argue that certain philosophical and computational developments have made it plausible (or at least more plausible than hitherto has been the case) to answer the automation question in the affirmative. In the first half of the twentieth century, regularity theory was the dominant philosophical approach to causal epistemology. According to the regularity camp, we infer the existence of type-level causal relationships from identified regularities in sequences of event types (Hume 1748; Mill 1843). For example, we infer that C causes E if every event of type C is routinely followed by an event of type E.Footnote 2 Naïve regularity theorists traditionally maintain that C causes E if certain Humean criteria (viz. temporal succession, spatiotemporal contiguity, necessary connection) are satisfied. However, these criteria may not be sufficient for relationships to count as causal in nature. Even though day regularly follows the night, both relata are contiguous, and there is the idea of a necessary connection between day and night, day does not cause the night (Reid 1785). Likewise, even though umbrella vendors are always around before it starts raining, we do not conclude that the presence of umbrella vendors causes rain (Kleinberg 2013). In other instances, these criteria may be unnecessary.Footnote 3 Even more sophisticated regularity theorists face the problem of distinguishing between type-level causation (i.e. regular occurrences at population level) and token-level causation (i.e. the occurrence of an effect in a particular scenario).

Philosophical advances in the second half of the twentieth century led to the development of the counterfactual and probabilistic approaches to causation and extended our understanding of how we make causal inferences, learn about causality, and acquire causal knowledge. According to the counterfactual camp, we can infer the existence of token-level causal relationships from the use of counterfactual conditionals and possible world semantics (Stalnaker 1968; Lewis 1973). For example, one might learn from the regularity camp that C causes E, since a material implication relation exists at type-level between C and E. When reasoning about a possible world that is sufficiently similar to the actual world in which ‘CE’ is true, one might conclude that if the event-token c had not occurred, then the event-token e would not have occurred.Footnote 4

According to the probabilistic camp, both the regularity and counterfactual camps erroneously hold that causes produce their effects without fail at type- or token-level.Footnote 5 Rather, the causal relationship between variables is probabilistic rather than deterministic in nature: C is a positive cause of E if the probability of E occurring is still (positively) altered by C, after common causes or confounders have been conditioned on (Reichenbach 1956; Suppes 1970; Eells 1991).Footnote 6 It should be noted that these philosophical advances have since been superseded by certain computational advances in the field of computer science and AI research (Pearl 2000; Kleinberg 2013). Nonetheless, while computational methods may or may not be philosophically informed, they tend to rely on certain philosophical assumptions as axioms for developing causal models of reality.

1.2 Main discussion: key philosophical assumptions

A key philosophical assumption that has emerged from these philosophical approaches to causation is the No Backwards Causation assumption. According to this assumption, causes cannot be temporally preceded by their effects.Footnote 7 Another key philosophical assumption is the Alteration of Probabilities assumption, according to which causes alter the probabilities of their effects. Positive causes raise the probability of their effects, whereas negative causes lower the probability of their effects.Footnote 8 Without the Alteration of Probabilities assumption, we will not be able to rely on observed statistical correlations as a guide to type-level causal relationships. Last but not least, the Causal Sufficiency assumption is a key philosophical assumption according to which there are no hidden common causes. Alternatively, the set of measured variables must be causally sufficient. By causal sufficiency is meant the following: for any set of variables V, every common cause (relative to V) of any pair of variables in V is also contained in V. Absent the Causal Sufficiency assumption, and hidden common causes (or confounders) can lead to spurious correlations between causally unrelated variables (or confounding bias) and result in erroneous causal inferences.Footnote 9

To illustrate the Causal Sufficiency assumption by means of a toy example, let us return to Reid’s (1785) day-and-night example (Fig. 1):

Fig. 1
figure 1

Reid’s (1785) Day-and-night example

We observe a marginal dependence relation between the relata X and Y: day regularly follows the night and vice versa. If we fail to observe the Causal Sufficiency assumption, we end up omitting the common cause or confounder Z from our set of variables. This will result in our incorrectly inferring, on the basis of the marginal dependence relation between X and Y, a causal dependence relation between X and Y. Conversely, if we observe the Causal Sufficiency assumption, no common causes (including Z) will be omitted. The joint effects of Z (viz. X and Y) will no longer erroneously appear to be dependent. In accordance with Reichenbach’s (1956) Common Cause Principle (described in fn. 6), X and Y will be conditionally independent, given their common cause Z (formally: XY | Z) (Fig. 1).

1.3 Main discussion: computational advances in causal epistemology

The No Backwards Causation, Alteration of Probabilities, and Causal Sufficiency assumptions are central features of the probabilistic camp and undergird the most prominent computational approaches to causal reasoning. Nonetheless, it is important to bear in mind that philosophical and computational advances in causal epistemology are related though distinct. While they might be guided by the theoretical considerations of philosophical approaches to causation, computational approaches are also characterized by a distinct set of concerns: the ability to translate theory into methods, practices, and procedures, the ability to operationalize definitions about causation and causal relationships, the ability to infer causal relationships from data, and the ability to render feasible the degree of computational complexity of the task.Footnote 10

Philosophical advances provide theoretical guidance about how causal relationships may be inferred. In turn, computational methods aim to translate theory into computationally feasible methods, practices, and procedures, operationalize general definitions about causation and causal relationships, and infer causal relationships from data.

An early computational advance in the second half of the twentieth century was the Granger Causality approach, which takes two time series, determines whether one time series is useful for forecasting the other, and subsequently uses these determinations as a basis for making causal inferences (Wiener 1956; Granger 1969, 1980). This approach relies on the Temporal Priority assumption (a variant of the No Backwards Causation assumption), the Alteration of Probabilities assumption, and a more method-specific No Redundant Information assumption.Footnote 11 While the pragmatically-oriented Granger Causality approach is recognized for its ability to operationalize definitions about causal relationships, incorporation of temporal information, ability to handle continuous variables, and applicability across such diverse fields as finance, neuroscience, and physics, there remains much debate about whether Granger Causality is necessary or sufficient for causality (Cartwright 1989; Hoover 2001).

A more recent computational advance has been the Graphical Model approach, which generates a Directed Acyclic Graph (or DAG) from a set of data (Pearl 2000; Spirtes et al. 2000).Footnote 12 These DAGs, also known as Bayesian networks, represent the causal structure of the system under investigation.Footnote 13 The Graphical Model approach has been designed to produce one or more graphs to represent the independence relations that are consistent with a given set of data and does not require the incorporation of temporal information, unlike the Granger Causality approach. This approach relies on the Alteration of Probabilities assumption, the Causal Sufficiency assumption, and a couple of other more method-specific axioms.Footnote 14

While the Graphical Model approach has enjoyed much popularity, it is arguably less complete than its more recent computational rival: the Probabilistic Temporal Logic approach of Kleinberg (2013, 2016). Like the Granger Causality approach, Kleinberg’s aim is to develop a computationally feasible method for reliably inferring causal relationships from time series data. Kripke structures represent the causal structure of the system under investigation and the DAGs of the Graphical Model approach can be subsumed under these Kripke structures.Footnote 15 Guided by the more computationally inclined philosophical considerations of Eells (1991), Kleinberg has developed an ADCS-inspired measure of causal significance to determine the average difference a cause makes to the probability of its effect.Footnote 16 Kleinberg’s approach relies on the No Backwards Causation assumption, the Alteration of Probabilities assumption, and the Causal Sufficiency assumption.

Nonetheless, certain issues remain. In the first instance, it has been argued that mechanistically mediated effects are hybrids of causal relationships (i.e. between causes and their effects) and constitutive relationships (i.e. between parts and the whole). Consider the causal claim that myocardial infarction is the cause of death. Death is a mechanistically mediated effect: when the heart stops beating, it stops transporting oxygen and nutrients to other tissues of the body, these tissues stop functioning, and the non-functioning constitutes the death of that individual (Craver and Bechtel 2007). Where causal reasoning in certain domains (e.g. neurobiology, molecular biology) typically necessitates an appeal to mechanisms, the details of scientific practice, and a multi-level character of explanations (both causal and constitutive), skepticism remains about whether our best-going computational models can capture and automate causal reasoning in these domains (Machamer et al. 2000). In the second instance, there is an ongoing debate about whether mental properties can in principle function as efficacious causes of both physical and mental properties. This is a debate about the validity and soundness of causal exclusion arguments, which typically conclude that mental properties supervening on physical properties cannot cause physical or other mental properties (Kim 2003). In addition, if mental properties count as causally efficacious, can the causal efficacy of these mental properties be accounted for on empirical grounds? We still lack a complete understanding of whether and how mental states (e.g. beliefs, thoughts, intentions, etc.) causally interact with one another and with physical states. Where causal explanation in certain domains (e.g. law, the social sciences) appeals to reasons, motives, intentions, desires, and beliefs, skepticism likewise remains about whether causal reasoning can be fully automated in these domains (Knobe 2009). The mechanistic and mental aspects of causal reasoning have not yet been successfully automated, and it is unclear whether they can be subsumed under the best-going computational models.Footnote 17

In the third instance, consider various dilemmas that arise in causal reasoning. Does the chicken cause the egg or does the egg cause the chicken? This dilemma is at least as old as Aristotle and may be structured as an infinite regression in causal reasoning. The causal relationship between the chicken and the egg is considered a strange loop in direct relation to Godël’s incompleteness theorem (Hofstadter 1999). In the context of complete causal self-referencing, it is Turing-incomputable insofar that the actual cause is undecidable (Kampis 1995). This chicken-and-egg-type dilemma cannot be resolved by a definite procedure. One may perform a root-cause analysis (or RCA), uncovering the underlying biological mechanisms when determining whether it is the chicken or the egg that is causally prior. However, the complex and non-linear nature of relationships over an evolutionary time-scale fosters enormous difficulties with respect to an RCA approach.Footnote 18 Retrodiction of the initial causes of complex and non-linear systems is impossible due to the intrinsic chaos and complexity of the systems in question (Li and Yorke 1975).

In the fourth instance, the best-going computational models aim to tell us what it is for something to count as a cause or how we can learn of causal relationships through datasets. However, computational models and methods may not be appropriate for all cases. As has been conceded by Kleinberg (2013), all theories of and methods for inferring causality—including Kleinberg’s own method—fail to handle at least some scenarios, potential challenges, and counterexamples. The lack of a unified solution has lent weight to philosophical arguments in favour of causal pluralism (Anscombe 1971; Cartwright 2007). If causal pluralism is correct, then there may be several different ways to learn about causal relationships and several methods, each of which may be suitable for different situations. In these (viz. the multi-level character of explanations, the causal efficacy of mental properties, chicken-and-egg-type dilemmas of causal reasoning, the complexity of systems, causal pluralism) and other instances, we maintain that causal reasoning cannot be fully automated and remain skeptical about the existence of a general and universal program capable of causal reasoning in a fully automated sense.

1.4 Main discussion: the automation question

Given these philosophical and computational advances in the second half of the twentieth and first decades of the twenty-first century, we might have good reason to believe that it has become plausible for us to answer the automation question in the affirmative. After all, might we not have good reason to expect further philosophical and computational advances to be made in the remainder of the twenty-first century, advances that could well deliver the hoped-for outcome of fully automated causal reasoning?

One might raise the caveat that handling real-world data is challenging: there might be biases, errors, or noise in the data, there is the ever-growing computational complexity of analyzing multidimensional data to be reckoned with, there could be gaps in the recording of the relevant variables, certain variables might be missing, and our knowledge could still remain incomplete. Suppose for the sake of argument that we encounter no problems with our data. Suppose further that we have in place a sufficiently advanced computational method that is guided by the state-of-the-art in philosophical theory and will allow us to infer causal relationships relative to this set of problem-free data. Are we therefore in a position to fully automate causal reasoning?Footnote 19

1.5 Main discussion: Meno’s Paradox

To address the automation question, we must proceed by way of a paradox, first introduced in a dialogue by Plato, and a response offered to this paradox by Michael Polanyi. Consider the following argument, adapted from Plato’s Meno (n.d., 80d-e):

P1: For any p, either one knows that p or one does not know that p.

P2: If one knows that p, then inquiry about p is unnecessary.

P3: If one does not know that p, then inquiry about p is impossible.

C: Therefore, for any p, inquiry about p is either unnecessary or impossible.Footnote 20

This argument is known as Meno’s Paradox. Plato’s resolution of this paradox involves the invocation of a theory of recollection. According to Plato’s theory of recollection, we begin neither in a state of complete knowledge about p nor in a state of complete ignorance about p. We come to know more about p by recollecting something that was learnt about p in a previous life.Footnote 21 There is much controversy surrounding Plato’s resolution of Meno’s Paradox via a theory of recollection (or the discovery of innate ideas). While the problem posed by Meno’s Paradox is an acute one, potentially throwing into question all inquiry as absurd (i.e. either unnecessary or impossible), an alternative response is required if Plato’s theory of recollection proves to be unsatisfactory.

Polanyi’s (1966) response is that if all knowledge is explicit (i.e. capable of being clearly stated in the propositional form p), then either inquiry is unnecessary (i.e. you know what you are searching for) or impossible (i.e. you do not know what you are searching for). In other words, if all knowledge is explicit, then Meno’s Paradox stands. However, if at least some knowledge is tacit (i.e. non-explicit), then we come to know more about something by relying on tacit knowledge.Footnote 22 Tacit knowledge, in turn, consists in the ‘intimation of something hidden, that we may yet discover’ (Polanyi 1966, pp 22–3). Given the fact that problems exist and discoveries (causal, scientific, etc.) can be made by solving these problems, the search for knowledge is clearly not absurd. The implication, if one accepts Polanyi’s (1966) response to Meno’s Paradox, is therefore that tacit knowledge exists: we can know things that we cannot yet tell.

One could raise the objection that the connection between the automation question and Meno’s Paradox is tenuous and unconvincing at best, insofar that the trade-off between implicit and explicit knowledge has typically been a topic of much AI research. How might causal reasoning be uniquely hampered by this particular trade-off?Footnote 23 In response to this objection, we must first distinguish between the nature of causation (ontic) and the nature of our knowledge about causation (epistemic). What Polanyi’s resolution of Meno’s Paradox suggests is that inquiry into the nature of both causation and our knowledge about causation is neither unnecessary nor impossible, since the original dilemma is false. The nature of causation is neither fully known nor fully unknown. We have at least some knowledge about certain aspects of the ontology of causation (e.g. causal relata may be characterized in terms of events, objects, processes, states of affairs, etc.) and the formal properties of causal relationships (e.g. transitivity, asymmetry, etc.). Likewise, the nature of our knowledge about causation is neither fully known nor fully unknown. We have some tacit knowledge of the nature of our knowledge about causation, which could be made explicit in the process of rational inquiry. We may even come to formalize and axiomatize certain salient aspects of causal reasoning. However, the nature of both causation and our knowledge about causation continues to remain contested. Where this contestation remains, we must conclude that we can know things that we cannot yet tell. Until the contested nature of both causation and our knowledge about it has been removed, our response to the automation question must remain a skeptical one. Last but not least, consider the paradox of self-reference that arises with respect to such statements as ‘This sentence is false’ and ‘All Cretans are liars’ (uttered by a Cretan). The ability to resolve paradoxes and decipher statements of this nature involves inter alia an ability to discern the implicit and explicit knowledge that is contained within these paradoxes and linguistic tricks. Before a response to the automation question can even be attempted, we must first understand the cognitive processes and causal reasoning procedures that bring about this discernment (Hofstadter 1999). While our understanding of these processes and procedures remains incomplete, we have further reasons to maintain a healthy skepticism when responding to the automation question.

1.6 Main discussion: tacit knowledge

In the context of causal epistemology, any philosophical and computational advances (e.g. in assumptions, general or operational definitions for causal relationships, supporting formal notation, axioms, equations, graphical representations, or data) pertain to the domain of explicit knowledge. Causal reasoning is not exhausted by the explicit knowledge that can be secured by the most robust and appropriate assumptions, philosophical theories, computational methods, and data. Rather, causal reasoning relies on the use of at least some tacit knowledge.

This tacit knowledge is in turn constituted by or derived from the following:

  1. (A)

    The epistemic faculty virtues and abilities of the causal reasoner:

    1. a

      Reliable perception;

    2. b

      Reliable memory;

    3. c

      Observation skills;

    4. d

      Procedural knowledge of and competence in how to skillfully conduct experiments, undertake statistical analysis, and imagine counterfactually;

    5. e

      Procedural knowledge of and competence in how to carry out critical evaluation, with a view to developing better theories, methods, and approachesFootnote 24;

  2. (B)

    The value systems and character traits of the causal reasonerFootnote 25:

  1. C

    The implicit knowledge base available to the causal reasoner:

    1. a

      Implicit background knowledge of experts (e.g. knowledge about possible mechanisms);

  2. D

    The habits that sustain our causal reasoning practices:

    1. a

      Heuristics and other cognitive shortcuts developed from experience;

    2. b

      Epistemic practices (e.g. deferring to and trusting epistemic superiors and peer review processes).

If at least some knowledge is tacit in nature and if not all of this tacit knowledge can ultimately be reduced to explicit knowledge, then we must recognize the tacit dimension in causal reasoning.Footnote 27 This tacit knowledge allows us inter alia to make certain refinements and further determinations (e.g. the relevant domain of application, the level of sophistication required, Kleinberg’s ε-value threshold as discussed in fn. 16, etc.) within each philosophical theory and computational method, critically evaluate competing theories and methods, recognize certain variables as genuine causes despite the violation of certain axioms, assumptions, and other related parameters, and potentially discover and develop new and alternative theories and methods.

An implication of this is that the idea that causal reasoning can be fully automated rests on a mereological mistake. What AI systems are capable of performing on the causal reasoning front is complex information processing, which is a part of causal reasoning but not the whole of it.Footnote 28 To assert that causal reasoning can be fully automated is to mistake the part (complex information processing) for the whole (causal reasoning, which presupposes in addition the skills, abilities, faculties, value systems, character traits, and cultural habits that give rise to tacit knowledge). At the same time, one could offer the following olive branch to AI researchers: what is denied is the possibility of full automation rather than that of partial automation (see fn. 19 for the distinction). The commonplace ex nihilo nihil sunt, first defended by Parmenides and later installed as a dictum in the AI research tradition, tells us that we get nothing from nothing. While not all assumptions can be made explicit in causal reasoning, that is also true for other domains in which AI has been applied. One must assume something from the outset to get a formal theory. However, once that assumption is in place, partial automation may be possible relative to our assumptions (both stated and implicit). Causal reasoning may not be that radically different after all from other AI-relevant domains, and one is reminded that while a part of the whole can be automated, the whole itself (e.g. causal reasoning in its broadest and most representative sense) cannot be.Footnote 29 Finally, it is known according to the Church–Turing thesis that there exist uncomputable functions and uncomputable real numbers that cannot be computed by any algorithm. This constitutes a fundamental mathematical constraint on the issue of full automation. To gain an understanding about the nature of this mereological mistake is to attain wisdom about causal reasoning, where wisdom is a higher rung on the epistemic hierarchy than (in descending order) knowledge, reasoning, data, and information.Footnote 30

2 Conclusion

To conclude, if by an algorithm for causal reasoning is meant a finite procedure, explicitly stated in pseudo- or actual code, through which causal reasoning is replicated, then an affirmative response to the automation question implies the existence (actual or possible) of such an algorithm.Footnote 31 Such an algorithm might be constructed on the basis of the state-of-the-art theoretical frameworks in philosophy, computational methods in econometrics and computer science, and the data relative to which causal relationships might be inferred. The most sophisticated algorithm would, however, remain strictly in the domain of explicit knowledge.

Given the nature of Meno’s Paradox and the possibility of scientific discovery (causal, scientific, etc.), one plausible philosophical response has been to invoke a distinction between tacit and explicit knowledge. If by causal reasoning is meant the entirety of the process through which we discover causal relationships between variables and make use of this knowledge in prediction, explanation, decision-making, and reasoning in terms of counterfactuals, then we have good grounds for maintaining that we do not merely rely on explicit knowledge (i.e. knowledge that is capable of being clearly stated in the propositional form). In addition, we rely on tacit knowledge, as might be constituted by or derived from the epistemic faculty virtues and abilities of the causal reasoner, the value systems and character traits of the causal reasoner, the implicit knowledge base available to the causal reasoner, and the habits that sustain our causal reasoning practices.

If the foregoing analysis is correct, then the answer to the automation question must be a resounding ‘no’. If the foregoing analysis is correct, then any implementation of algorithms that approximate causal reasoning ought to be corroborated where possible with other methods (e.g. the use of background knowledge, the reliance on domain experts, the undertaking of experimental studies and RCTs, etc.), given the probabilistic and defeasible nature of causal reasoning. If the foregoing analysis is correct, then in the event of confusion and uncertainty (e.g. when dealing with complex cases), any final appeal should be made to our traditional storehouses of tacit knowledge, the domain experts themselves.