Judging machines: philosophical aspects of deep learning

Schubbach, Arno

doi:10.1007/s11229-019-02167-z

Judging machines: philosophical aspects of deep learning

Published: 09 March 2019

Volume 198, pages 1807–1827, (2021)
Cite this article

Synthese Aims and scope Submit manuscript

Arno Schubbach ORCID: orcid.org/0000-0003-4536-3940¹

2234 Accesses
18 Citations
Explore all metrics

Abstract

Although machine learning has been successful in recent years and is increasingly being deployed in the sciences, enterprises or administrations, it has rarely been discussed in philosophy beyond the philosophy of mathematics and machine learning. The present contribution addresses the resulting lack of conceptual tools for an epistemological discussion of machine learning by conceiving of deep learning networks as ‘judging machines’ and using the Kantian analysis of judgments for specifying the type of judgment they are capable of. At the center of the argument is the fact that the functionality of deep learning networks is established by training and cannot be explained and justified by reference to a predefined rule-based procedure. Instead, the computational process of a deep learning network is barely explainable and needs further justification, as is shown in reference to the current research literature. Thus, it requires a new form of justification, that is to be specified with the help of Kant’s epistemology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

WisdomNet: trustable machine learning toward error-free classification

Article 08 July 2020

Artificial Intelligence: Issues, Challenges, Opportunities and Threats

Instruments, agents, and artificial intelligence: novel epistemic categories of reliability

Article 19 November 2022

Notes

For a short sketch of the history of this approach cf. Goodfellow et al. (2016, pp. 12–26).
For the historical background of these two contests cf. Heßler (2017, pp. 1–33). In the second section, I will develop a systematic comparison of Deep Blue and AlphaGo instead of writing a history of artificial intelligence research.
For more technical details cf. the developer’s paper Campbell et al. (2002, pp. 57–83); for the historical background cf. Ensmenger (2011, pp. 5–30, esp. 10–17).
Surely, this description is highly simplified. For a more detailed description cf. Silver et al. (2016, pp. 484–489).
The assumption of a formal and logical character of thinking as well as computation was the basis of artificial intelligence research for a long time (Floridi 1999, pp. 132–134, 146–148) and made chess one of their pivotal paradigms (Heßler 2017, pp. 6–9). For further cultural and historical reasons for the crucial role of chess cf. Ensmenger (2011, pp. 17–21).
The analogy of the formal and logical character of thinking and computation was the object of the—let’s say—classical critique of artificial intelligence research, cf. Dreyfus (1992, pp. 67–79, 155–188) or Searle (1984, pp. 28–56). The analogy of biological and artificial networks is critically discussed in Floridi (1999, pp. 169–175). Goodfellow et al. (2016, p. 16) draw the conclusion: “one should not view deep learning as an attempt to simulate the brain. Modern deep learning draws inspiration from many fields”.
Cf. https://de.wikipedia.org/wiki/Deep_Blue. The name ‘Deep Blue’ goes back to the computer ‘Deep Thought’ in Douglas Adams’ The Hitchiker’s Guide to the Galaxy and IBM’s nick name ‘Big Blue’, cf. Hsu (2002, pp. 69 and 126sq.). If there is any technical reason for calling it deep, it is the fact that it could perform ‘deep searches’ within the tree of possible moves and consequential moves, cf. ib. (p. 197).
For a preliminary definition of algorithms cf. Floridi (1999, p. 47).
I follow here the well-informed guess of Schmidhuber (2015, p. 96).
For an introduction to this approach cf. Floridi (1999, pp. 196–207). This is the state of the art discussed in one of the most interesting philosophical approaches to artificial intelligence research, Donald Gillies’ Artificial Intelligence and Scientific Method (1996).
Cf. in addition to the already cited passages of Dreyfus’ and Searle’s texts Dreyfus (1992, pp. ix–xxx) and Collins (1990, pp. 3–58).
Cf. Schmidhuber (2015, p. 97), with reference to Deep Blue’s contest with Kasparov in 1997 and the pattern recognition of small children and computers then and in 2011.
That artificial intelligence research made progress in developing special purpose machines, is only a criticism if we presuppose that its primary aim is to imitate intelligent human abilities and that this aim was lost by focusing on special purposes, cf. paradigmatically Dreyfus (1992, p. 27). Instead, the present article is focusing on the different approaches of these special purpose machines and their epistemological consequences.
Against the backdrop of philosophy of technology, we could dispute if it is adequate to conceive of such a complex machine as Deep Blue as a tool. At this point, however, this discussion leads astray.
Cade Metz, The Sadness and the Beauty of Watching Google’s AI Play Go, in: Wired, March, 3, 2016, https://www.wired.com/2016/03/sadness-beauty-watching-googles-ai-play-go/ (last access 18 June 2018). Technically speaking, AlphaGo was trained using a combination of supervised and unsupervised learning (i.e., reinforcement), cf. Silver et al. (2016, pp. 484–486). While supervised learning requires the definition of the desired behavior of a DLN by a target value for every element of the training data, unsupervised learning aims to identify patterns within the data without such specifications.
In a further step, Silver et al. (2017, pp. 354–359) developed AlphaGo Zero exclusively based on unsupervised learning so that it dispenses with the requirement of human expertise: “AlphaGo becomes its own teacher”. So, it discovered “not only fundamental elements of human Go knowledge, but also non-standard strategies beyond the scope of traditional Go knowledge.” (ib., p. 357).
Dreyfus already saw the philosophical importance of ‘neural networks’: „neural networks raise deep philosophical questions. It seems that they undermine the fundamental rationalist assumption that one must have abstracted a theory of a domain in order to behave intelligently in that domain.“(Dreyfus 1992, p. xxxiii) But in difference to the present paper, Dreyfus discusses artificial intelligence research primarily in view of the aim to replicate general and adaptive human intelligence and criticizes it on the basis of his “phenomenology of human intelligent action” (ib., lisq.). For this purpose, he readapts his criticism of expert systems to machine learning, cf. ib. (pp. xxxiii–xlvi); but this readaptation seems to be less accurate, also because research on DLNs has made enormous progress since then.
In this paper I focus on our cooperation with computers; the consequences for our understanding of computing I plan to unfold in a second paper. There, I would like to detail the question of how deep learning introduces a new paradigm of computing and a new conception of ‘representation’ by computer ‘models’.
At different levels of difficulty, there are many good introductions to deep learning available that offer first insights into the implementation of a learning network. Rashid (2016), offers an easy-to-read introduction; Grant Sanderson’s video tutorial www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi takes a similar approach and can be recommended as a very entertaining introduction; Buduma (2017) is more demanding as well as comprehensive and works with Google’s TensorFlow; Graupe (2016, esp. 111sqq.) puts an interesting emphasis on applications of DLNs. The most encompassing overview is provided by Goodfellow et al. (2012).
The most basic difference concerns cyclic (recurrent) or non-cyclic (feedforward) networks. Convolutional networks are a special type of feedforward networks known for their astonishing performance in many important applications, cf. the overview of LeCun et al. (2015, pp. 439sq.).
In the case of supervised learning backpropagation is the most important algorithm for adjusting the weights of the links, cf. Schmidhuber (2015, p. 91). For the basic idea of this algorithm and for an overview of the historical development of the research into it, cf. ib., (pp. 89–94). In the case of unsupervised learning reinforcement takes its place, cf. ib. (pp. 100–103).
Cf. Buduma (2017, pp. 27–37) for a short overview over the most important problems of training and the crafts of optimizing the training.
Ribeiro et al. (2016, abstract). For a more prominent text cf. Castelvecchi (2016).
That a computing machine can be determined by its history is also highlighted by Wegner (1998), with reference to the more general conception of ‘interactive machines’.
Therefore, Zeiler and Fergus (2014, p. 818) observe for the most successful type of networks for image recognition and similar tasks, so called large convolutional network models, the following: “there is still little insight into the internal operation and behavior of these complex models, or how they achieve such good performance. From a scientific standpoint, this is deeply unsatisfactory. Without clear understanding of how and why they work, the development of better models is reduced to trial-and-error.”
Cf. https://algorithmwatch.org/ or https://netzpolitik.org/2018/new-york-city-plant-arbeitsgruppe-zur-ueberpruefung-von-staedtischen-algorithmen/ (last access 29 June 2018) on a municipal law in New York City with a similar aim.
The technique that is mostly used in the context of statistical learning theory is the approach of ‘support vector machines’ (SVM), cf. Kulkarni and Harman (2011, pp. 172–186) whereof deep learning is considered to be “a special case”, cf. Harman and Kulkarni (2007, pp. 78–87, esp. 87). Thus, it is presupposed that DLNs—as SVMs—implement “rules that assign a classification to every possible set of features” (ib., p. 89). More precisely: “Such networks [feedforward neural networks] encode principles in their connection weights and there is no reason to expect people to have access to the relevant principles.” (ib. p. 92) Against the backdrop of my argumentation, the assumption of encoded principles to which we have no access seems questionable.
For an exemplary and interesting study on the training process of DLNs based on statistical mechanics cf. Martin and Mahoney (2017).
The well-known research funding organization of the US-American military, DARPA, has launched a special funding program for this research field in 2016: “The goal of Explainable Artificial Intelligence (XAI) is to create a suite of new or modified machine learning techniques that produce explainable models that, when combined with effective explanation techniques, enable end users to understand, appropriately trust, and effectively manage the emerging generation of Artificial Intelligence (AI) systems.” (https://www.darpa.mil/attachments/DARPA-BAA-16-53.pdf, 5 [last access 18 June 2018]). Yet, the challenge to explain or justify results has a longer history in machine learning as well as in adjacent fields, as Biran and Cotton (2017) show.
Cf. the well noticed paper Zeiler and Fergus (2014, pp. 818–825). They combine a convolutional network—the type of network that is the most important for a lot of applications—with a further deconvolutional network in order to visualize the features of relevance for the functionality of the different hidden layers, cf. ib. (p. 824). The very nice digital publication Olah et al. (2017) hints at the limits of this approach: “By itself, feature visualization will never give a completely satisfactory understanding. We see it as one of the fundamental building blocks that, combined with additional tools, will empower humans to understand these systems.” (ib., without pagination, conclusion) Cf. also Mordvintsev et al. (2015) and Mahendran and Vedaldi (2016).
Justification in this sense is not to be equated with mathematical criteria or measures of the performance of learning machines. Different varieties of such measures are discussed in an interesting paper by Corfield (2010).
Cf. Hendricks (2016, pp. 3 and 5): “In contrast to systems […] which aim to explain the underlying mechanism behind a decision, Biran and McKeown (2014) concentrate on why a prediction is justifiable to the user. Such systems are advantageous because they do not rely on user familiarity with the design of an intelligent system in order to provide useful informations.”
Floridi (1999, p. 35), with reference to the universal Turing machine as the standard model of algorithmic processing.
This means that the functionality of the DLN can be computed or simulated by a classical algorithmic machine, but its functionality is not defined in the form of an algorithm. Therefore, deep learning is to be distinguished from the algorithmic paradigm detailed by the universal Turing machine and to be understood as an own paradigm of computing. This argument I plan to unfold with reference to the philosophy of computing in a further paper.
Collins and Kusch (1998, p. 50), themselves occasionally concede possible limits of their approach, which refers almost exclusively to “good old artificial intelligence”. Their criticism primarily covers “the research program of artificial intelligence (at least, the program that preceded neural nets and so forth)”. 20 years ago, this limitation may not have been too severe, as there were good reasons to be skeptical about the performance of DLNs, cf. Collins and Kusch (ib., p. 129sq.).
In the following I will draw partly on an interpretation of Kant elaborated in Schubbach (2016, 147sqq.).
Kant (1790/2000, p. 121): “Rather, as a necessity that is thought in an aesthetic judgment, it can only be called exemplary, i.e., a necessity of the assent of all to a judgment that is regarded as an example of a universal rule that one cannot produce.” [Emphases in original]
Kant (1790/2000, p. 186): “since there can also be original nonsense, its [the genius’] products must at the same time be models, i.e., exemplary” [Emphases in original]
Kant (1781/1998, p. 127): “Experience is without doubt the first product that our understanding brings forth as it works on the raw material of sensible sensations.” Floridi (1999, p. 229), with reference to the conception of the algorithm insofar as it was already outlined by the mechanical calculators in the prehistory of computing.
For Kant, a judgment without objective validity based on the rules of understanding remains a kind of philosophical curiosity. Therefore, he introduces a new form of so called ‘intersubjective validity’ adequate for the aesthetic judgment. This intersubjective validity is based not on common rules of processing as the objective validity of knowledge judgments, but expresses the common reaction to a sensory stimulus that is to be explained by the common constitution and faculties of human beings, cf. Kant (1790/2000, p. 170). This argument results out of a rather simple and disputable reading, but sheds an interesting light on the question to what extent we can understand and comprehend the results of DLNs. Kant’s argument seems not to go any further, since the processing of DLNs and human judgement do not operate on a common basis, and self-learning machines develop their own mode of operation. The moves of AlphaGo that appeared totally foreign to human Go experts shortly discussed in the second section of this paper seem to confirm this thought.
This is not only the case in medical diagnosis or similar applications, but also in credit scoring or evaluation of job applications. For a powerful polemic against the use of mathematical methods and their impact on society, cf. O’Neil (2016).
Cf. https://www.theguardian.com/technology/2016/mar/24/tay-microsofts-ai-chatbot-gets-a-crash-course-in-racism-from-twitter?CMP=twt_a-technology_b-gdntech (last access 6 July 2018).

References

Biran, O., & Cotton, C. (2017). Explanation and justification in machine learning: A survey. XAI workshop at IJCAI 2017, Melbourne, Australia. http://www.cs.columbia.edu/~orb/papers/xai_survey_paper_2017.pdf. Accessed June 25, 2018.
Biran, O., & McKeown, K. (2014). Justification narratives for individual classifications. AutoML workshop at ICML 2014, Beijing, China. http://www.cs.columbia.edu/~orb/papers/justification_automl_2014.pdf. Accessed June 21, 2018.
Biran, O., & McKeown, K. (2017). Human-centric justification of machine learning predictions. In: C. Sierra (Ed.), Proceedings of the twenty-sixth international joint conference on artificial intelligence. Main track (pp. 1461–1467). https://www.ijcai.org/proceedings/2017/0202.pdf. Accessed June 25, 2018.
Buduma, N. (2017). Fundamentals of deep learning: Designing next-generation machine intelligence algorithms. With contributions by Nicholas Locascio. Sebastopol, CA: O’Reilly Media, Inc.
Google Scholar
Campbell, M., Hoane, A. J., & Hsu, F. (2002). Deep blue. Artificial Intelligence, 134, 57–83.
Article Google Scholar
Castelvecchi, D. (2016). The black box of AI. Nature, 538, 20–23.
Article Google Scholar
Ciodaro, T., et al. (2012). Online particle detection with neural networks based on topological calorimetry information. Journal of Physics: Conference Series, 368, 012030.
Google Scholar
Collins, H. M. (1990). Artificial experts: Social knowledge and intelligent machines. Cambridge, MA: MIT Press.
Google Scholar
Collins, H., & Kusch, M. (1998). The shape of actions: What humans and machines can do. Cambridge, MA: MIT Press.
Google Scholar
Corfield, D. (2010). Varieties of justification in machine learning. Minds and Machines, 20, 291–301.
Article Google Scholar
Dreyfus, H. L. (1992). What computers still can’t do: A critique of artificial reason. Cambridge, MA: MIT Press.
Google Scholar
Ensmenger, N. (2011). Is chess the drosophila of artificial intelligence? A social history of an algorithm. Social Studies of Science, 42, 5–30.
Article Google Scholar
Floridi, L. (1999). Philosophy and computing: An introduction. London: Routledge.
Google Scholar
Floridi, L. (Ed.). (2015). The onlife manifesto: Being human in a hyperconnected era. Cham: Springer.
Google Scholar
Gillies, D. (1996). Artificial intelligence and scientific method. Oxford: Oxford University Press.
Google Scholar
Ginsborg, H. (1999). The role of taste in Kant’s theory of cognition. New York: Routledge.
Google Scholar
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. Cambridge, MA: MIT Press.
Google Scholar
Graupe, D. (2016). Deep learning neural networks: Design and case studies. Singapore: World Scientific Publishing Company.
Book Google Scholar
Harman, G., & Kulkarni, S. (2007). Reliable reasoning: Induction and statistical learning theory. Cambridge, MA: MIT Press.
Book Google Scholar
Hendricks, L. A., et al. (2016). Generating visual explanations. In: B. Leibe et al. (Eds.), Computer vision—ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11-14, 2016. Proceedings, Part IV (= Lecture Notes in Computer Science, 9908) (pp. 3–19). Cham.
Heßler, M. (2017). Der Erfolg der “Dummheit”: Deep Blues Sieg über den Schachweltmeister Garri Kasparov und der Streit über seine Bedeutung für die Künstliche Intelligenz-Forschung. N. T. M., 25, 1–33.
Hsu, F.-H. (2002). Behind deep blue. Princeton: Princeton University Press.
Google Scholar
Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255–260.
Article Google Scholar
Kant, I. (1781/1998). Critique of pure reason (Trans. and ed. by Paul Guyer and Allen W. Wood). Cambridge.
Kant, I. (1790/2000). Critique of the power of judgment (Ed. by Paul Guyer, Trans. by Paul Guyer and Eric Matthews). Cambridge.
Kulkarni, S., & Harman, G. (2011). An elementary introduction to statistical learning theory. Hoboken, NJ: Wiley.
Book Google Scholar
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521, 436–444.
Article Google Scholar
Leung, M. K. K., et al. (2014). Deep learning of the tissue-regulated splicing code. Bioinformatics, 30, i121–i129.
Article Google Scholar
Ma, J., et al. (2015). Deep neural nets as a method for quantitative structure–activity relationships. Journal of Chemical Information and Modeling, 55, 263–274.
Article Google Scholar
Mahendran, A., & Vedaldi, A. (2016). Visualizing deep convolutional neural networks using natural pre-images. International Journal of Computer Vision, 120(3), 233–255.
Article Google Scholar
Martin, C. H., & Mahoney, M. W. (2017). Rethinking generalization requires revisiting old ideas: Statistical mechanics approaches and complex learning behavior. https://arxiv.org/pdf/1710.09553.pdf. Accessed Nov 22, 2018.
Mjolsness, E., & DeCoste, D. (2001). Machine learning for science: State of the art and future prospects. Science, 293, 2051–2054.
Article Google Scholar
Mordvintsev, A., Olah, C., & Tyka, M. (2015). Inceptionism: Going deeper into neural networks. https://ai.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html. Accessed June 27, 2018.
O’Neil, C. (2016). Weapons of math destruction: How big data increases inequality and threatens democracy. New York: Broadway Books.
Google Scholar
Olah, C., Mordvintsev, A., & Schubert, L. (2017). Feature visualization: How neural networks build up their understanding of images. Distill. https://distill.pub/2017/feature-visualization/. Accessed June 27, 2018.
Rashid, T. (2016). Make your own neural network: A gentle journey through the mathematics of neural networks. Scotts Valley: CreateSpace Independent Publishing Platform.
Google Scholar
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should I trust you?” Explaining the predictions of any classifier. KDD 2016, San Francisco, CA. https://arxiv.org/abs/1602.04938. Accessed June 21, 2018.
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117.
Article Google Scholar
Schubbach, A. (2016). Die Genese des Symbolischen: Zu den Anfängen von Ernst Cassirers Kulturphilosophie. Hamburg: Felix Meiner Verlag.
Book Google Scholar
Searle, J. (1984). Minds, brains and science. Cambridge, MA: Harvard University Press.
Google Scholar
Silver, D., et al. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529, 484–489.
Article Google Scholar
Silver, D., et al. (2017). Mastering the game of Go without human knowledge. Nature, 550, 354–359.
Article Google Scholar
Smolensky, P. (1988). On the proper treatment of connectionism. Behavioral and Brain Sciences, 11, 1–74.
Article Google Scholar
Wegner, P. (1998). Interactive foundations of computing. Theoretical Computer Science, 192, 315–351.
Article Google Scholar
Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In: D. Fleet, et al. (Eds.), Computer vision—ECCV 2014, 13th European Conference Zurich, Switzerland, September 6-12, 2014, Part I (= Lecture Notes in Computer Science, 8689) (pp. 818–833). Cham.

Download references

Funding

Funding was provided by Swiss National Science Foundation (Grant No. 100012_165574/1). Research project “Concepts and Practices of ‘Darstellung’ in Philosophy, Chemistry and Painting around 1800”.

Author information

Authors and Affiliations

Department of Humanities, Social and Political Sciences, ETH Zurich, Zurich, Switzerland
Arno Schubbach

Authors

Arno Schubbach
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arno Schubbach.

Ethics declarations

Conflict of interest

The author declares that he has no conflict of interest.

Human and animal rights

The conducted research did not involve any human participants or animals.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schubbach, A. Judging machines: philosophical aspects of deep learning. Synthese 198, 1807–1827 (2021). https://doi.org/10.1007/s11229-019-02167-z

Download citation

Received: 19 July 2018
Accepted: 26 February 2019
Published: 09 March 2019
Issue Date: February 2021
DOI: https://doi.org/10.1007/s11229-019-02167-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Judging machines: philosophical aspects of deep learning

Abstract

Access this article

Similar content being viewed by others

WisdomNet: trustable machine learning toward error-free classification

Artificial Intelligence: Issues, Challenges, Opportunities and Threats

Instruments, agents, and artificial intelligence: novel epistemic categories of reliability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Judging machines: philosophical aspects of deep learning

Abstract

Access this article

Similar content being viewed by others

WisdomNet: trustable machine learning toward error-free classification

Artificial Intelligence: Issues, Challenges, Opportunities and Threats

Instruments, agents, and artificial intelligence: novel epistemic categories of reliability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation