Explanations in AI as Claims of Tacit Knowledge

Lam, Nardi

doi:10.1007/s11023-021-09588-1

Explanations in AI as Claims of Tacit Knowledge

S.I. : Machine learning: Prediction Without Explanation?
Published: 11 January 2022

Volume 32, pages 135–158, (2022)
Cite this article

Minds and Machines Aims and scope Submit manuscript

Nardi Lam ORCID: orcid.org/0000-0002-4267-9598¹

1170 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

As AI systems become increasingly complex it may become unclear, even to the designer of a system, why exactly a system does what it does. This leads to a lack of trust in AI systems. To solve this, the field of explainable AI has been working on ways to produce explanations of these systems’ behaviors. Many methods in explainable AI, such as LIME (Ribeiro et al. in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016), offer only a statistical argument for the validity of their explanations. However, some methods instead study the internal structure of the system and try to find components which can be assigned an interpretation. I believe that these methods provide more valuable explanations than those statistical in nature. I will try to identify which explanations can be considered internal to the system using the Chomskyan notion of tacit knowledge. I argue that each explanation expresses a rule, and through the localization of this rule in the system internals, we can take a system to have tacit knowledge of the rule. I conclude that the only methods which are able to sufficiently establish this tacit knowledge are those along the lines of Olah (Distill 2(11): 4901–4911, 2017), and therefore they provide explanations with unique strengths.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Ethics of AI Ethics: An Evaluation of Guidelines

Article Open access 01 February 2020

Thilo Hagendorff

A survey on large language model based autonomous agents

Article Open access 22 March 2024

Lei Wang, Chen Ma, … Jirong Wen

The mechanisms of AI hype and its planetary and social costs

Article Open access 02 April 2024

Alva Markelius, Connor Wright, … Yu-Ting Kuo

Data Availability

None.

Code Availability

None.

Notes

In AI, the terms ‘decision’, ‘action’, ‘output’ and ‘result’ are generally used interchangeably. From an agnostic information processing perspective, these are all just the signal(s) the system outputs to the environment, but depending on the context one of the terms may be more appropriate than the others. If the signal is used to move a robot arm, it can be described as an action, if it is used to inform another action, it may be called a decision. I prefer generic terms like ‘output’ when discussing the more theoretical aspects and terms like ‘decision’ when discussing practical implications, to emphasize the consequences these outputs may have when viewing the systems as actors in a societal context.
By behavior, I mean the outputs of an AI system, or more specifically: the relationships between inputs and the corresponding outputs that the system produces. How the system arrives at these outputs is not a part of the behavior: it is entirely treated as a black box.
For example via a saliency map, see Sect. 6 for details.
Here the term ‘knowledge’ is used a bit more loosely than it generally is in epistemology (e.g. it does not need to be true). Knowledge is roughly meant to mean ‘information an agent uses to determine their actions’. This usage is common in psychological and AI literature. It can alternatively be read as ‘putative knowledge’ or even just ‘information’.
This claim is taken from an example in Miller (2019, Fig. 2) where it is used to illustrate how an agent might explain the results of a system which classifies arthropods.
As an example of this terminology, Blum and Langley (1997) named the “selection of relevant features” as one of the central problems in machine learning, and describes the way in which a machine learning program can learn about concepts in terms of features: “[Concept learning is] deciding which features to use in describing the concept and deciding how to combine those features”.
In other words, I assume that the explanation ‘y because p’ is complete, in the sense that it is not needed to take into account any factors other than p to determine that y. If we instead assume p is (only) a necessary condition for y, we obtain a counterfactual theory of causation (Menzies and Beebee 2020): if p had not been there, y would not have been the case.
Note that ‘computation’ and ‘behavior’ are two separate things: while the behavior is only the observed input-output relationship, by computation I am referring to all the steps in between that lead the system to produce this output. While looking at the behavior treats the system as a black box, to get a complete picture of the computation we in fact need to have full access to the system’s internals.
The absence of theoretical structure is a consequence of the black-boxness of the model. In machine learning, we start with a high-level “task analysis” (Clark 1990, p. 215), and use this to construct an implementation through some process of trial and error. But we are lacking what Clark calls a “fully articulated competence theory” of the task, one expressed in traditional mathematical language and incorporating symbols which are meaningful to us.
Many linguists will disagree, and might rather speak of a statistical kind of knowledge. They will argue that simply the fact that the order ‘eats an apple’ is much more likely to occur in conversation than ‘an apple eats’ allows one to judge the first sentence as correct, without any internalization of an abstract concept such as ‘word order’. This is an application of the so-called distributional hypothesis (Harris 1970). However, this statistical view is exactly the prevalent view in machine learning, to which I am trying to provide a complementary perspective, so I will not take time to discuss it here.
The statement ‘\(p \Rightarrow q\)’ is equivalent to saying that p is a sufficient condition for q. If we would adhere to a counterfactual theory of explanation, where p is (only) a necessary condition for q, we would instead have that \(q \Rightarrow p\).
It should be noted that the kind of tacit knowledge we are looking for here is knowledge that is not explicit, but can be made explicit, as it is represented in explicit form by the claims we eventually make about the system. Collins (2010) refers to this as explicable knowledge, and gives a taxonomy of tacit knowledge, in which he orders different types by “the degree of resistance of the tacit knowledge to being made explicit” (Collins 2010, p. 85). In his terminology, the knowledge we are looking for might be relational tacit knowledge, or possibly somatic tacit knowledge.
At most, there may be an approximate representation of R, but its existence will in any case be dependent on our choice to interpret it as such.
One could say that by ‘discovering’ the rule we are in effect performing the event of formulation, and turning it from a tacit rule to an explicit one. This is true in some sense, as we are creating an explicit formulation of the rule. However, the rule itself remains tacit, as the way in which it is possessed by the agent (i.e. the internal representation of the rule) does not change. It remains distinct from an explicit rule in that the agent themselves has still not formulated it. The explicit version of the rule is only a result of our interpretation, and so the agent is not beholden to it. Any ‘meaning’ or semantic interpretation imparted onto the processing performed by the agent using this statement of the rule will be our responsibility as well. As such, making tacit knowledge in an agent explicit does not change the nature of the rule, it only helps to verify our interpretation of the agent’s behavior.
It should be noted that it is not entirely evident why exactly Chomsky’s and Davies’ ‘tacit knowledge’ should both be referring to the same kind of knowledge. Instead, it is better to view Davies’ definition as a separate kind of knowledge, with the same spirit as Chomsky’s notion: to provide a meaningful middle ground where one can speak of knowledge that an agent possesses without having to either admit all possible knowledge that is compatible with the behavior, or exclude all knowledge whose presence cannot be directly observed.
One can require various levels of strictness in this localization: must there be a specific (combination of) processing unit(s) in the system which represents this concept, in the same way, 100% of the time? Or is it only a requirement that we always need to be able to obtain an answer on the ‘8 legs’-question from ‘somewhere’, but that its representation may change from case to case, as long as we ‘know where to look’? In other words, do we only require that T is expressible as some mathematical function, or do we place further restrictions on it?
I am inclined to not be too strict in this regard, if only for the fact that even a physical implementation of a straightforward symbolic program may have various physical (or implementational) states corresponding to the same symbolic state in different contexts, for example for purposes of making the computation more efficient. Clark (1990, p. 207) also seems to hint at this in his assessment of Davies’ work, saying that the representative state in the system “is not a simple physical state so must as a state of the virtual machine over which the processing story is defined”. In addition, if we restrict the definition of a ‘component’ too much it will be very hard to deal with so-called distributed representations of concepts. Schröder (1998) notices this, and argues we should instead allow for several common causal factors in Davies’ definition, which is similar to allowing a rule to have a representation which varies between situations.
However, having too loose a requirement clearly devalues the term ‘localizable’ to be nearly meaningless: if the representation of ‘having 8 legs’ is different in every scenario, the argument for presence of ‘having 8 legs’ in the system seems quite weak. Presumably, we can at least speak of a spectrum: the more localizable a rule is, the stronger the claim of its presence will be.
This same question is also being tackled by other fields, such as neuro-symbolic integration (also called hybrid learning).
This example is again taken from Miller (2019, Fig. 2).
In more recent research, the authors of LIME have also worked on methods that cast the relationship between input and output into the form of rules. These rules specify a sufficient condition for a certain output to be produced (Ribeiro et al. 2018), which means they correspond to the rules discussed in this paper. While their rule-based method does make it more feasible to produce the kind of explanations I am after, it is still susceptible to the criticism that it does not look at the internal state of the system, and is therefore only descriptive of the behavior.
For a counterfactual theory of causation, it also cannot establish whether the feature is a necessary condition (e.g. to be considered a spider, an animal needs to have 8 legs). In this case the problem is that the actual relationship may vary greatly across different inputs, so that there may be, for example, an unobserved input that is still considered a spider but is not considered to have 8 legs.
His go-to example is the cluster analysis of NETtalk, a connectionist network which converts English words into (textually encoded) phonemes (Sejnowski and Rosenberg 1987). By performing a clustering procedure over the internal states of the system, the system could be found to have rediscovered (among other things) that there is a useful division to be made between vowels and consonants (Clark 1993, pp. 52-54). Cluster analysis seems to have fallen by the wayside, but it is similar to modern day applications of dimensionality reduction methods to discover ‘typical’ states of neural networks (e.g. Li et al. 2020). It has also influenced the method of representational similarity analysis (RSA) (Kriegeskorte 2008), which is used to perform experiments concerning the ways in which the human brain represents information.
Note that, while the computation a single neuron performs is very simple (a nearly linear function of its inputs), if we consider a neuron a few layers deep in the network it will be provided with the results of all prior processing, and so the output of this neuron is in fact a very complex function of the inputs. Therefore inputs which maximize it may contain many kinds of structure, possibly encoding high-level concepts.
‘Visualization’ implies a visual representation of the feature, which is often appropriate because many black-box systems of interest deal with image processing. However, what seems more important to me is the method by which a visualization of a feature is obtained, i.e. the optimization part. That the feature is then presented in visual form seems irrelevant, it might just as well be an representative piece of text or an audio sample depending on the system.
This relates back to the question about the ‘strictness’ of the localization we require: see footnote 16. While it would be nice if a single concept is represented by a single neuron, because that is highly localizable, it might not realistically happen very often and we might encounter mostly polysemantic neurons instead. However, it might still be the case that a certain combination of these neurons responds to a specific concept (and another combination to another concept, etc.): this is the idea of distributed representation. For example, in representational similarity analysis (Kriegeskorte 2008) correlations between various internal states corresponding to similar concepts are used to justify that the concepts are being represented, without having to unambiguously localize the concept. Basically, to find a distributed representation of a concept we should be able to perform some transformation on the combined state of several neurons in order to demonstrate its presence. This raises some further philosophical questions, however, such as whether or not the concept is not partially (or even mostly) represented by the transformation we apply (i.e. our interpretation), instead of in the system itself. Just as in the previous discussion, if we allow for some nuance we can at least argue for a correlation between the two: “the more localizable a rule is, the stronger the claim of its presence will be”.
In practice, there may of course be further slight numerical approximations, such as neglecting a causal path containing near zero weights. But these are easily justifiable. I am mostly concerned with approximations which are hard to justify, such as the proper interpretation of a fuzzy concept such as ‘blue’.
The journal Distill publishes many of these new approaches to visualizing or otherwise making explanations about neural networks accessible, for example using interactive visual maps (Carter et al. 2019) or by annotating sentences with the influences between characters when performing natural language prediction (Madsen 2019).

References

Blum, A. L., & Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97(1–2), 245–271. https://doi.org/10.1016/s0004-3702(97)00063-5
Article MathSciNet MATH Google Scholar
Carter, S., Armstrong, Z., Schubert, L., Johnson, I., & Olah, C. (2019). Activation Atlas. Distill, 4(3), e15. https://doi.org/10.23915/distill.00015.
Article Google Scholar
Charte, D., Charte, F., del Jesus, M. J., & Herrera, F. (2020). An analysis on the use of autoencoders for representation learning: Fundamentals, learning task case studies, explainability and challenges. Neurocomputing, 404, 93–107.
Article Google Scholar
Chomsky, N. (1965). Aspects of the theory of syntax. MIT Press.
Google Scholar
Clark, A. (1990). Connectionism, competence, and explanation. The British Journal for the Philosophy of Science, 41(2), 195–222. https://doi.org/10.1093/bjps/41.2.195.
Article Google Scholar
Clark, A. (1993). Associative engines: Connectionism, concepts, and representational change. MIT Press.
Book Google Scholar
Codella, N. C. F., Hind, M., Ramamurthy, K. N., Campbell, M., Dhurandhar, A., Varshney, K. R., Wei, D., & Mojsilovic, A. (2018). Teaching meaningful explanations. arXiv: 1805.11648v2 [cs.AI]
Collins, H. M. (2010). Tacit and explicit knowledge. University of Chicago Press.
Book Google Scholar
Davies, M. (1989). Connectionism, modularity, and tacit knowledge. The British Journal for the Philosophy of Science, 40(4), 541–555. https://doi.org/10.1093/bjps/40.4.541
Article Google Scholar
Davies, M. (1990). Knowledge of rules in connectionist networks. Intellectica, 9(1), 81–126.
Google Scholar
Davies, M. (1991). Concepts, connectionism, and the language of thought. In W. Ramsey, S. P. Stich, & D. Rumelhart (Eds.), Philosophy and connectionist theory (pp. 485–503). Lawrence Erlbaum Associates.
Google Scholar
Davies, M. (1995). Two notions of implicit rules. Philosophical Perspectives, 9, 153–83. https://doi.org/10.2307/2214216.
Article Google Scholar
Ehsan, U., Harrison, B., Chan, L., & Riedl, M. O. (2017). Rationalization: a neural machine translation approach to generating natural language explanations. arXiv: 1702.07826v2 [cs.AI].
Harris, Z. S. (1970). Distributional structure. In Papers in structural and trans-formational linguistics (pp. 775–794). https://doi.org/10.1007/978-94-017-6059-1_36
Kriegeskorte, N. (2008). Representational similarity analysis — connecting the branches of systems neuroscience. Frontiers in Systems Neuroscience, 4, 2. https://doi.org/10.3389/neuro-06-004-2008.
Article Google Scholar
Li, M., Zhao, Z., & Scheidegger, C. (2020). Visualizing neural networks with the grand tour. Distill, 5(3), e23. https://doi.org/10.23915/distill.0002.
Article Google Scholar
Löwe, S., O’Connor, P., & Veeling, B. S. (2019). Putting an end to end-to-end: Gradient-isolated learning of representations. arXiv: 1905.11786 [cs.LG].
Madsen, A. (2019). Visualizing memorization in RNNs. Distill, 4(3), e13. https://doi.org/10.23915/distill.00016.
Article Google Scholar
Menzies, P., & Beebee, H. (2020). Counterfactual theories of causation. In E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy (Winter 2020). Metaphysics Research Lab, Stanford University.
Google Scholar
Miller, T. (2019). Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence, 267, 1–38. https://doi.org/10.1016/j.artint.2018.07.007
Article MathSciNet MATH Google Scholar
Montavon, G., Samek, W., & Müller, K. R. (2018). Methods for interpreting and understanding deep neural networks. Digital Signal Processing, 73, 1–15. https://doi.org/10.1016/j.dsp.2017.10.011.
Article MathSciNet Google Scholar
Moravcsik, J. M. E. (1974). Competence, creativity and innateness. The hague atlantic highlands. In J. M. E. Moravcsik (Ed.), Logic and philosophy for linguists: A book of readings. Mouton Publishers Humanities Press.
Google Scholar
Olah, C., Cammarata, N., Schubert, L., Goh, G., Petrov, M., & Carter, S. (2020). Zoom in: An introduction to circuits. Distill, 5(3), e24. https://doi.org/10.23915/distill.00024.001.
Article Google Scholar
Olah, C., Mordvintsev, A., & Schubert, L. (2017). Feature visualization. Distill, 2(11), e7. https://doi.org/10.23915/distill.00007.
Article Google Scholar
Olah, C., Satyanarayan, A., Johnson, I., Carter, S., Schubert, L., Ye, K., & Mordvintsev, A. (2018). The building blocks of interpretability. Distill, 3(3), e10. https://doi.org/10.23915/distill.00010.
Article Google Scholar
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why Should I Trust You?”: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. https://doi.org/10.1145/2939672.2939778.
Ribeiro, M. T., Singh, S., & Guestrin, C. (2018). Anchors: High-precision model-agnostic explanations. In Proceedings of the AAAI Conference on Artificial Intelligence, 32(1).
Rudin, C. (2018). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1, 206–215.
Article Google Scholar
Schröder, J. (1998). Knowledge of rules, causal systematicity, and the language of thought. Synthese, 117(3), 313–330. https://doi.org/10.1023/A:1005096727650.
Article Google Scholar
Sejnowski, T. J., & Rosenberg, C. R. (1987). Parallel networks that learn to pronounce English text. Complex Systems, 1(1), 145–168.
MATH Google Scholar
Smolensky, P. (1988). On the proper treatment of connectionism. Behavioral and Brain Sciences, 11(1), 1–23. https://doi.org/10.1017/s0140525x00052432.
Article MathSciNet Google Scholar
Zednik, C. (2019). Solving the black box problem: A normative framework for explainable artificial intelligence. Philosophy and Technology. https://doi.org/10.1007/s13347-019-00382-7
Article Google Scholar

Download references

Funding

None.

Author information

Authors and Affiliations

Amsterdam, Netherlands
Nardi Lam

Authors

Nardi Lam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nardi Lam.

Ethics declarations

Conflict of interest

None

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lam, N. Explanations in AI as Claims of Tacit Knowledge. Minds & Machines 32, 135–158 (2022). https://doi.org/10.1007/s11023-021-09588-1

Download citation

Received: 01 May 2021
Accepted: 17 December 2021
Published: 11 January 2022
Issue Date: March 2022
DOI: https://doi.org/10.1007/s11023-021-09588-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Explanations in AI as Claims of Tacit Knowledge

Abstract

Access this article

Similar content being viewed by others

The Ethics of AI Ethics: An Evaluation of Guidelines

A survey on large language model based autonomous agents

The mechanisms of AI hype and its planetary and social costs

Data Availability

Code Availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Explanations in AI as Claims of Tacit Knowledge

Abstract

Access this article

Similar content being viewed by others

The Ethics of AI Ethics: An Evaluation of Guidelines

A survey on large language model based autonomous agents

The mechanisms of AI hype and its planetary and social costs

Data Availability

Code Availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation