Skip to main content
Log in

Sources of Understanding in Supervised Machine Learning Models

  • Research Article
  • Published:
Philosophy & Technology Aims and scope Submit manuscript

Abstract

In the last decades, supervised machine learning has seen the widespread growth of highly complex, non-interpretable models, of which deep neural networks are the most typical representative. Due to their complexity, these models have showed an outstanding performance in a series of tasks, as in image recognition and machine translation. Recently, though, there has been an important discussion over whether those non-interpretable models are able to provide any sort of understanding whatsoever. For some scholars, only interpretable models can provide understanding. More popular, however, is the idea that understanding can come from a careful analysis of the dataset or from the model’s theoretical basis. In this paper, I wish to examine the possible forms of obtaining understanding of such non-interpretable models. Two main strategies for providing understanding are analyzed. The first involves understanding without interpretability, either through external evidence for the model’s inner functioning or through analyzing the data. The second is based on the artificial production of interpretable structures, through three main forms: post hoc models, hybrid models, and quasi-interpretable structures. Finally, I consider some of the conceptual difficulties in the attempt to create explanations for these models, and their implications for understanding.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Availability of data and material

Not applicable.

Code availability

Not applicable.

Notes

  1. ML encompasses other forms of learning too, such as unsupervised and reinforcement learning. Those other types of ML pose specific problems as regards to understanding, which differ in significant aspects from SML. Several implications discussed here, nonetheless, apply also to unsupervised and reinforcement learning.

  2. “Model” may be taken in two different, although related senses: either as a general mathematical function; or as the result of the instantiation of a mathematical function to a particular dataset, with specific parameters. Call these two senses of “model,” model-g (of “general”) and model-p (of “particular”), respectively. A model-g stipulates a functional form to relate inputs and outputs (e.g., a linear function). An algorithm, then, establishes a procedure to obtain, or learn, the parameters for a particular dataset. The result is a model-p, which offers a precise relation between inputs and outputs, and which can be used to predict or classify new observations. It is the former that is of interest to this discussion.

  3. Sometimes, the details of a model may be hidden for privacy reasons, as in the case of proprietary softwares. My discussion centers on non-interpretable models de re, those that are “too complicated for a human to comprehend” (Rudin, 2018, 19), and not on non-interpretable models de facto; i.e., models which are interpretable in principle, but had their structure intentionally concealed.

  4. A precise definition of “interpretability” is hard to obtain for multiple reasons. First, “interpretability” is usually defined through interpretability-like terms, such as “explanation” and “understanding” (Krishnan, 2019). Moreover, interpretability is always a matter of degree, covering a spectrum that goes from fully transparent models to fully opaque ones. In addition, interpretability is always contingent on a particular task, audience, and domain—it varies from person to person, and it is not something that can be determined apart from concrete situations. (For example, up to how many variables a multiple linear model remains interpretable? Is a model with twenty parameters still interpretable? And what kind of feature transformations are understandable?). Finally, this concept is usually ill-defined by those who employ it (Lipton, 2016). Although I take “interpretability” as a valid and useful notion, it is important to be aware of some of the potential difficulties with the use of this concept.

  5. Lipton (2016) also considers three different scopes of “interpretability”: the entire model, the individual components or the training algorithm. The first sense imposes a very strong requirement, whereas the last one depends on the understanding of an element that goes beyond the model itself. So, I will be using “interpretability” in the second sense considered by him, as the understanding of how features are individually responsible for producing the output— “decomposability,” as he calls it.

  6. Here, I follow the current trend in philosophy of science that prefers to speak of “understanding” in place of “knowledge” for epistemological theorizing (e.g., De Regt 2017; Kvanvig 2009; Strevens 2008). Traditionally, understanding had been conceived as a psychological concept, without the sort of objectivity that knowledge would allegedly provide (Hempel, 1965). More recently, though, there has been a reappraisal of this notion, where understanding is taken in a more pragmatic vein, as the ability to use a theory or model, rather than a mental state—understanding a theory or a model is being able to apply it properly and be able to produce correct explanations and predictions under certain circumstances. Understanding has the advantage of not imposing some of the epistemic requirements of knowledge; particularly, the factivity condition. See Páez (2019) for an understanding-centered approach to AI.

  7. The importance of understanding the mechanism, of course, depends on individual needs and interests. For an electronic engineer, understanding the underpinning mechanism of a calculator may be essential to her work.

  8. There are several works that deal with interpretability, understanding and explainability in AI; e.g., Lipton (2016), Páez (2019), Krishnan (2019), Zednik (2021), Zerilli et al. (2019), Creel (2020), Carabantes (2020), Zednik & Boelsen (2020), and Watson & Floridi (2021). This paper differs from them in examining the distinct forms of interpretability-production as well as their implications to understanding and explanation.

  9. We could think of an even stronger sense of interpretability, one that involved understanding why a certain outcome is produced. This would lead us to a discussion of causal explanations, something that would immensely complicate the problem under consideration. I hope to address this matter in a further investigation.

  10. For a comprehensive overview of approaches and works in XAI, see Guidotti et al., 2018; Adadi & Berrada 2018; Dosilovic et al., 2018.

  11. “Agnostic” means that the algorithm is not restricted to a class of models but can be applied to all sorts of SML models. “Local” means that “it is possible to understand only the reasons for a specific decision,” as in comparison with a global understanding, when “we are able to understand the whole logic of a model and follow the entire reasoning leading to all the different possible outcomes” (Guidotti et al., 2018, 6).

  12. Wiegreffe & Pinter (2019) question the assumptions of Sarthak & Wallace (2019) and Serrano & Smith (2019). They argue that attention weights are robust in the complex situations in which they are necessary and when they are trained together with the rest of the network. What matters to us here is that an attention mechanism is not prima facie interpretable.

  13. From an interventionist approach to explanation, Grimsley et al. (2020) offer another reason for the conclusion that attention is not an explanation: its analysis is resistant to surgical manipulation. Causality depends on some variables being held constant; however, holding weights constant while manipulating attention is meaningless, since attention only makes sense when trained together with the other parameters.

  14. Miller et al. (2017) and Miller (2017) cite another problem in XAI: explanatory models are still deeply attached to standards of explanations favored by programmers, and not to types of explanations for general users. The authors argue that explanations should be more firmly based on an empirical assessment of human cognition and social norms.

  15. A third important type of reason for action is “motivations.” But since models are not intentional agents, I will not discuss this last case.

  16. I am not considering situations where the purpose of the explanation is not to provide understanding but to produce psychological comfort, build confidence, or simply gain thoughtless acceptance of a decision.

References

  • Adadi, A., & Berrada, M. (2018). Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access, 6, 52138–52160.

    Article  Google Scholar 

  • Alvarez, M. (2009). How many kinds of reasons? Philosophical Explorations, 12(2), 181–193.

    Article  Google Scholar 

  • Alvarez, M. (2017). Reasons for action: Justification, motivation, explanation. In E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy. Accessed 13 Mar 2022. https://plato.stanford.edu/archives/win2017/entries/reasons-just-vs-expl/

  • Bahdanau, D., et al. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.

  • Bastani, O., Kim, C., & Bastani, H. (2017). Interpretability via model extraction. arXiv, preprint arXiv:1706.09773.

  • Bien, J., & Tibshirani, R. (2011). Prototype selection for interpretable classification. The Annals of Applied Statistics, p. 2403–2424.

  • Burrell, J. (2016). How the machine ‘thinks’: Understanding opacity in machine learning algorithms. Big Data & Society, 3(1). https://doi.org/10.1177/2053951715622512

  • Cano, A., Zafra, A., & Ventura, S. (2013). An interpretable classification rule mining algorithm. Information Sciences, 240, 1–20.

  • Carabantes, M. (2020). Black-Box Artificial Intelligence: An Epistemological and Critical Analysis. AI Society, 35, 309–317.

    Article  Google Scholar 

  • Conneau, et al. (2018). What you can cram into a single vector: Probing sentence embeddings for linguistic properties. arXiv preprint arXiv:1805.01070

  • Creel, K. A. (2020). Transparency in complex computational systems. Philosophy of Science, 87(4), 568–589.

    Article  Google Scholar 

  • De Regt, H. W. (2017). Understanding Scientific Understanding. Oxford University Press.

    Book  Google Scholar 

  • Dellsén, F. (2016). Scientific Progress: Knowledge Versus Understanding. Studies in History and Philosophy of Science Part A, 56, 72–83.

    Article  Google Scholar 

  • Dennett, D. (1987). The intentional stance. MIT Press.

    Google Scholar 

  • Devlin, et al. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

  • Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.

  • Dosilovic, F. K., Bri, M., & Hlupic, N. (2018). Explainable artificial intelligence: A survey. 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), p. 210–215.

  • Fedus, W., Zoph, B., & Shazeer, N. (2021). Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. arXiv preprint arXiv:2101.03961.

  • Grimsley, et al. (2020). Why attention is not explanation: Surgical intervention and causal reasoning about neural models. Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), p. 1780–1790.

  • Guidotti, R., Monreale, A., Turini, F., Pedreschi, D., & Giannotti, F. (2018). A survey of methods for explaining black box models. CoRR, abs/1802.01933.

  • Hempel, C. G. (1965). Aspects of scientific explanation. New York: Free Press.

  • Hieronymi, P. (2011). Reasons for action. In Proceedings of the Aristotelian Society, volume 111, p. 407–427. Oxford University Press Oxford.

  • Jurafsky, D., & Martin, J. H. (2019). Speech and language processing (3rd ed. draft)https://web.stanford.edu/jurafsky/slp3/. Accessed 13 Mar 2022.

  • Krishnan, M. (2019). Against interpretability: A critical examination of the interpretability problem in machine learning. Philosophy & Technology, 33(3), 487–502.

  • Kvanvig, J. (2009). The Value of Understanding. In E. Value (Ed.), Haddock, Adrian; Millar, Alan, Millar; Pritchard, Duncan (pp. 95–112). Oxford University Press.

    Google Scholar 

  • Lakkaraju, H., Bach, S. H., & Leskovec, J. (2016). Interpretable decision sets: A joint framework for description and prediction. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, p. 1675–1684.

  • Lipton, Z. C. (2016). The mythos of model interpretability. CoRR, abs/1606.03490.

  • Mao, et al. (2019) The neuro-symbolic concept learner: Interpreting scenes, words, and sentences from natural supervision. arXiv preprint arXiv:1904.12584.

  • Mikolov, et al. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

  • Miller, T., Howe, P., & Sonenberg, L. (2017). Explainable AI: Beware of inmates running the asylum or: How I learnt to stop worrying and love the social and behavioural sciences. CoRR, abs/1712.00547.

  • Miller, T. (2017). Explanation in artificial intelligence: Insights from the social sciences. CoRR, abs/1706.07269.

  • Páez, A. (2019). The pragmatic turn in explainable artificial intelligence (XAI). Minds and Machines, 29(3), 441–459.

  • Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). Why should I trust you?: Explaining the predictions of any classifier. CoRR, abs/1602.04938.

  • Robbins, S. (2019). A misdirected principle with a catch: Explicability for AI. Minds and Machines, 29(4), 495–514.

    Article  Google Scholar 

  • Rudin, C. (2018). Please stop explaining black box models for high stakes decisions. ArXiv, abs/1811.10154.

  • Sarthak, J., & Wallace, B. C. (2019). Attention is not explanation. arXiv preprint arXiv:1902.10186.

  • Serrano, S., & Smith, N. A. (2019). Is attention interpretable? arXiv preprint arXiv:1906.03731.

  • Strevens, M. (2017). The whole story: Explanatory autonomy and convergent. In D. M. Kaplan (Ed.), Explanation and integration in mind and brain science (pp. 101–111). Oxford University Press.

  • Strevens, M. (2008). Depth: An account of scientific explanation. Cambridge, MA: Harvard University Press.

  • Sullivan, E. (2019). Understanding from machine learning models. British Journal for the Philosophy of Science. https://doi.org/10.1093/bjps/axz035

  • Van Fraassen, B. C. (1980). The Scientific Image. Clarendon Press.

    Book  Google Scholar 

  • Vaswani, A. et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems, p. 5998–6008.

  • Wachter, S., Mittelstadt, B. D., & Russell, C. (2017). Counterfactual explanations without opening the black box: Automated decisions and the GDPR. CoRR, abs/1711.00399.

  • Watson, D. S., & Floridi, L. (2021). The explanation game: A formal framework for interpretable machine learning. Synthese, 198(10), 9211–9242.

    Article  Google Scholar 

  • Wiegreffe, S., & Pinter, Y. (2019). Attention is not not explanation. arXiv preprint arXiv:1908.04626.

  • Xu, K. et al. (2015). Show, attend and tell: Neural image caption generation with visual attention. International Conference on Machine Learning, p. 2048–2057.

  • Zednik, C. (2021). Solving the Black Box Problem: A Normative Framework for Explainable Artificial Intelligence. Philos. Technol., 34, 265–288. https://doi.org/10.1007/s13347-019-00382-7

    Article  Google Scholar 

  • Zednik, C., & Boelsen, H. (2020). The exploratory role of explainable artificial intelligence. http://philsci-archive.pitt.edu/id/eprint/18005. Accessed 13 Mar 2022.

  • Zerilli, J., Knott, A., Maclaurin, J., & Gavaghan, C. (2019). Transparency in Algorithmic and Human Decision-Making: Is There a Double Standard? Philosophy & Technology, 32(4), 661–683.

    Article  Google Scholar 

Download references

Acknowledgements

I would like to thank the anonymous reviewers of this paper for their helpful comments. This work was carried out at the Center for Artificial Intelligence (C4AI-USP), with support by the São Paulo Research Foundation (FAPESP grant #2019/07665-4) and by the IBM Corporation.

Funding

FAPESP grant #2019/07665–4.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paulo Pirozelli.

Ethics declarations

Conflicts of interest/Competing interests

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pirozelli, P. Sources of Understanding in Supervised Machine Learning Models. Philos. Technol. 35, 23 (2022). https://doi.org/10.1007/s13347-022-00524-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13347-022-00524-4

Keywords

Navigation