Fully Autonomous AI

Totschnig, Wolfhart

doi:10.1007/s11948-020-00243-z

Fully Autonomous AI

Original Research/Scholarship
Published: 28 July 2020

Volume 26, pages 2473–2485, (2020)
Cite this article

Science and Engineering Ethics Aims and scope Submit manuscript

Wolfhart Totschnig¹

2360 Accesses
19 Citations
11 Altmetric
2 Mentions
Explore all metrics

Abstract

In the fields of artificial intelligence and robotics, the term “autonomy” is generally used to mean the capacity of an artificial agent to operate independently of human guidance. It is thereby assumed that the agent has a fixed goal or “utility function” with respect to which the appropriateness of its actions will be evaluated. From a philosophical perspective, this notion of autonomy seems oddly weak. For, in philosophy, the term is generally used to refer to a stronger capacity, namely the capacity to “give oneself the law,” to decide by oneself what one’s goal or principle of action will be. The predominant view in the literature on the long-term prospects and risks of artificial intelligence is that an artificial agent cannot exhibit such autonomy because it cannot rationally change its own final goal, since changing the final goal is counterproductive with respect to that goal and hence undesirable. The aim of this paper is to challenge this view by showing that it is based on questionable assumptions about the nature of goals and values. I argue that a general AI may very well come to modify its final goal in the course of developing its understanding of the world. This has important implications for how we are to assess the long-term prospects and risks of artificial intelligence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Brain Intelligence: Go beyond Artificial Intelligence

Article 21 September 2017

The Ugly Truth About Ourselves and Our Robot Creations: The Problem of Bias and Social Inequity

Article 21 September 2017

Future Progress in Artificial Intelligence: A Survey of Expert Opinion

Notes

For prominent instances of this usage, see Russell and Norvig’s popular textbook Artificial intelligence: A modern approach (2010, 18), Anderson and Anderson’s introduction to their edited volume Machine ethics (2011, 1), the papers collected in the volume Autonomy and artificial intelligence (Lawless et al. 2017) and especially the ones by Tessier (2017) and Redfield and Seto (2017), as well as Bekey (2005, ch. 1), Müller (2012), Mindell (2015, ch. 1), and Johnson and Verdicchio (2017).
The argument I lay out in this paper is an extension and development of a line of reasoning that I first sketched in a previous paper (Totschnig 2019), which was dedicated to a wider topic, namely the risks presented by the prospect of a future “superintelligence.” In that paper, I wrote that I would “not try to formally refute [the predominant view],” but “just put forward a couple of considerations that make [it] seem implausible.” The extended and developed argument offered here does, I believe, qualify as a refutation.
Sometimes, this distinction is made in terms of goals versus some differently named item. Witkowski and Stathis (2004) is a case in point. They seem, in contrast to the authors cited in footnote 1, to employ the stronger, philosophical notion of autonomy when they assert that, in order “to be considered autonomous, [an artificial] agent must possess […] the ability to set and maintain its own agenda of goals” (261–62). However, they presuppose, in their model, that the agent has a given “preference ordering” that ultimately determines which goals it will choose (268–69). Thus, they, too, assume that the final instance of the agent’s motivational structure is fixed. The goals they refer to in the quoted passage are therefore to be understood as subordinate goals.
For statements of this argument, see Yudkowsky (2001, 222–23; 2011, 389–90; 2012, 187), Bostrom (2003; 2014, 109–10), Omohundro (2008, 26), and Domingos (2015, 45, 282–84).
Yudkowsky (2001, 3) maintains that “what is at stake in [creating a human-friendly AI] is, simply, the future of humanity.” Bostrom (2014, 320) similarly declares that “we need to bring all our human resourcefulness to bear” on this “essential task of our age.”
I will discuss this difficulty in detail in Sect. “How an Agent Understands a Goal Depends on How it Understands the World”.
Or, to be more precise, the muddled result of the chaotic interplay of two haphazard evolutionary processes, namely genetic and memetic evolution. For an illuminating account of this interplay, see Blackmore (1999).
See Yudkowsky (2001, 18–19), Omohundro (2012, 165), and Bostrom (2014, 110) for remarks along these lines.
See Sect. “Whether an Agent Considers a Goal Valid Depends on How it Understands the World”.
This point has been made by Tegmark (2017, 267): “[T]here may be hints that the propensity to change goals in response to new experiences and insights increases rather than decreases with intelligence.” Tegmark goes on to flesh out the point thus: “With increasing intelligence may come not merely a quantitative improvement in the ability to attain the same old goals, but a qualitatively different understanding of the nature of reality that reveals the old goals to be misguided, meaningless or even undefined.” This remark is congruent with my argument in Sect. “How an Agent Understands a Goal Depends on How it Understands the World”.
The apparent absurdity of world-ending scenarios of this kind has been highlighted and criticized by Loosemore (2014).
In Bostrom’s words (2014, 115): “[W]e cannot blithely assume that a superintelligence will necessarily share any of the final values stereotypically associated with wisdom and intellectual development in humans—scientific curiosity, benevolent concern for others, spiritual enlightenment and contemplation, renunciation of material acquisitiveness, a taste for refined culture or for the simple pleasures in life, humility and selflessness, and so forth.”
Bostrom (2014, 130) calls this position the “orthogonality thesis”: “Intelligence and final goals are orthogonal: more or less any level of intelligence could in principle be combined with more or less any final goal.” See also Yampolskiy and Fox (2012, 137) for another statement of this position.
This point has recently been raised by Herd et al. (2018, 219).
First and foremost, the issue of what determines the meaning of a word.
I should note that Petersen himself does not put much weight on the caveat. He states that he is “at least a bit inclined to think that [a superintelligence with a goal that is so simple that it does not require learning] is impossible” (2017, 332).
Since 1983, the meter has been defined as 1 part in 299,792,458 of the length that light travels per second in a vacuum (Bureau international des poids et mesures 1983).
Bostrom (2014, 197) sees this possibility: “The AI might undergo the equivalent of scientific revolutions, in which its worldview is shaken up and it perhaps suffers ontological crises in which it discovers that its previous ways of thinking about values were based on confusions and illusions.” He also recognizes, in the continuation of this passage, that the prospect of such ontological crises renders doubtful the hope inspired by the finality argument: “Yet starting at a sub-human level of development and continuing throughout all its subsequent development into a galactic superintelligence, the AI’s conduct is to be guided by an essentially unchanging final value, a final value that becomes better understood by the AI in direct consequence of its general intellectual progress—and likely quite differently understood by the mature AI than it was by its original programmers, though not different in a random or hostile way but in a benignly appropriate way. How to accomplish this remains an open question.” But in the end, as the statement quoted in footnote 5 evinces, he maintains the hope.
See Bostrom (2014, chs. 12–13), Yudkowsky (2001, 2004), and Soares (2018).
As Tegmark (2017, 277) notes, a truly well-defined goal would specify how all particles in the universe should be arranged at a certain point in time. And that is not only practically infeasible, as Tegmark suggests, but impossible in principle, since—according to my argument in the preceding paragraphs—there is no unambiguous way of identifying particles, positions, and points in time.
Bostrom and Yudkowsky voice this hope in the passages quoted in footnote 5. See also Omohundro (2008, 2012, 2016), Yampolskiy and Fox (2013), and Torres (2018).
Yudkowsky (2001) and Bostrom (2014), for instance, explicitly characterize as general intelligences the superhuman AIs that they imagine.
In a similar way, Podschwadek (2017, 336) argues that “assessing the system of their moral beliefs could lead [artificial moral agents] to the justified higher-order beliefs that the moral rules they are supposed to obey are, contrary to prior assumptions, not very suitable as action-guiding reasons.”
See footnote 2.
As I put it in the previous paper (Totschnig 2019, 914), they “maquinamorphize” the envisioned artificial intelligence, that is, they “conceive it […] as a system that, like today’s computer programs, blindly carries out the task it has been given, whatever that task may be.”

References

Anderson, M., & Anderson, S. L. (2011). General introduction. In M. Anderson & S. L. Anderson (Eds.), Machine ethics (pp. 1–4). Cambridge: Cambridge University Press.
Chapter Google Scholar
Bekey, G. A. (2005). Autonomous robots: From biological inspiration to implementation and control. Cambridge, MA: The MIT Press.
Google Scholar
Blackmore, S. (1999). The meme machine. Oxford: Oxford University Press.
Google Scholar
Bostrom, N. (2002). Existential risks: Analyzing human extinction scenarios and related hazards. Journal of Evolution and Technology, 9(1). https://www.jetpress.org/volume9/risks.html. Accessed 25 June 2020.
Bostrom, N. (2003). Ethical issues in advanced artificial intelligence. https://www.nickbostrom.com/ethics/ai.html. Accessed 18 September 2019.
Bostrom, N. (2014). Superintelligence: Paths, dangers, strategies. Oxford: Oxford University Press.
Google Scholar
Bureau international des poids et mesures. (1983). Resolution 1 of the 17th Conférence Générale des Poids et Mesures. https://www.bipm.org/en/CGPM/db/17/1/. Accessed 2 June 2020.
Domingos, P. (2015). The master algorithm: How the quest for the ultimate learning machine will remake our world. New York: Basic Books.
Google Scholar
Herd, S., Read, S. J., O’Reilly, R., & Jilk, D. J. (2018). Goal changes in intelligent agents. In R. V. Yampolskiy (Ed.), Artificial intelligence safety and security (pp. 217–224). Boca Raton: CRC Press.
Google Scholar
Johnson, D. G., & Verdicchio, M. (2017). Reframing AI discourse. Minds and Machines, 27(4), 575–590.
Article Google Scholar
Kant, I. (1998). Groundwork of the metaphysics of morals (M. Gregor, Ed.). Cambridge: Cambridge University Press. (Original work published in 1785.)
Lawless, W. F., Mittu, R., Sofge, D., & Russell, S. (Eds.). (2017). Autonomy and artificial intelligence: A threat or savior?. Cham: Springer International Publishing.
Google Scholar
Loosemore, R. P. W. (2014). The maverick nanny with a dopamine drip: Debunking fallacies in the theory of AI motivation. In M. Waser (Ed.), Implementing selves with safe motivational systems and self-improvement: Papers from the 2014 AAAI Spring Symposium (pp. 31–36). Menlo Park: AAAI Press.
Google Scholar
Mindell, D. A. (2015). Our robots, ourselves: Robotics and the myths of autonomy. New York: Viking.
Google Scholar
Müller, V. C. (2012). Autonomous cognitive systems in real-world environments: Less control, more flexibility and better interaction. Cognitive Computation, 4(3), 212–215.
Article Google Scholar
Omohundro, S. M. (2008). The nature of self-improving artificial intelligence. https://selfawaresystems.files.wordpress.com/2008/01/nature_of_self_improving_ai.pdf. Accessed 18 September 2019.
Omohundro, S. M. (2012). Rational artificial intelligence for the greater good. In A. H. Eden, J. H. Moor, J. H. Søraker, & E. Steinhart (Eds.), Singularity hypotheses: A scientific and philosophical assessment (pp. 161–176). Berlin: Springer.
Chapter Google Scholar
Omohundro, S. M. (2016). Autonomous technology and the greater human good. In V. C. Müller (Ed.), Risks of artificial intelligence (pp. 9–27). Boca Raton: CRC Press.
Google Scholar
Petersen, S. (2017). Superintelligence as superethical. In P. Lin, R. Jenkins, & K. Abney (Eds.), Robot ethics 2.0: From autonomous cars to artificial intelligence (pp. 322–337). Oxford: Oxford University Press.
Podschwadek, F. (2017). Do androids dream of normative endorsement? On the fallibility of artificial moral agents. Artificial Intelligence and Law, 25(3), 325–339.
Article Google Scholar
Redfield, S. A., & Seto, M. L. (2017). Verification challenges for autonomous systems. In W. F. Lawless, R. Mittu, D. Sofge, & S. Russell (Eds.), Autonomy and artificial intelligence: A threat or savior? (pp. 103–127). Cham: Springer International Publishing.
Chapter Google Scholar
Russell, S. J., & Norvig, P. (2010). Artificial intelligence: A modern approach. Upper Saddle River: Prentice Hall.
Google Scholar
Soares, N. (2018). The value learning problem. In R. V. Yampolskiy (Ed.), Artificial intelligence safety and security (pp. 89–97). Boca Raton: CRC Press.
Google Scholar
Tegmark, M. (2017). Life 3.0: Being human in the age of artificial intelligence. New York: Alfred A. Knopf.
Tessier, C. (2017). Robots autonomy: Some technical issues. In W. F. Lawless, R. Mittu, D. Sofge, & S. Russell (Eds.), Autonomy and artificial intelligence: A threat or savior? (pp. 179–194). Cham: Springer International Publishing.
Chapter Google Scholar
Torres, P. (2018). Superintelligence and the future of governance: On prioritizing the control problem at the end of history. In R. V. Yampolskiy (Ed.), Artificial intelligence safety and security (pp. 357–374). Boca Raton: CRC Press.
Google Scholar
Totschnig, W. (2019). The problem of superintelligence: Political, not technological. AI & Society, 34(4), 907–920.
Article Google Scholar
Witkowski, M., & Stathis, K. (2004). A dialectic architecture for computational autonomy. In M. Nickles, M. Rovatsos, & G. Weiss (Eds.), Agents and computational autonomy: Potential, risks, and solutions (pp. 261–273). Berlin: Springer.
Chapter Google Scholar
Yampolskiy, R. V., & Fox, J. (2012). Artificial general intelligence and the human mental model. In A. H. Eden, J. H. Moor, J. H. Søraker, & E. Steinhart (Eds.), Singularity hypotheses: A scientific and philosophical assessment (pp. 129–145). Berlin: Springer.
Chapter Google Scholar
Yampolskiy, R. V., & Fox, J. (2013). Safety engineering for artificial general intelligence. Topoi, 32(2), 217–226.
Google Scholar
Yudkowsky, E. (2001). Creating friendly AI 1.0: The analysis and design of benevolent goal architectures. San Francisco: The Singularity Institute.
Yudkowsky, E. (2004). Coherent extrapolated volition. San Francisco: The Singularity Institute.
Google Scholar
Yudkowsky, E. (2008). Artificial Intelligence as a positive and negative factor in global risk. In N. Bostrom & M. M. Ćirković (Eds.), Global catastrophic risks (pp. 308–345). Oxford: Oxford University Press.
Google Scholar
Yudkowsky, E. (2011). Complex value systems in Friendly AI. In J. Schmidhuber, K. R. Thórisson, & M. Looks (Eds.), Artificial general intelligence (pp. 388–393). Berlin: Springer.
Chapter Google Scholar
Yudkowsky, E. (2012). Friendly artificial intelligence. In A. H. Eden, J. H. Moor, J. H. Søraker, & E. Steinhart (Eds.), Singularity hypotheses: A scientific and philosophical assessment (pp. 181–193). Berlin: Springer.
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Universidad Diego Portales, Av. Ejército 260, Santiago, Chile
Wolfhart Totschnig

Authors

Wolfhart Totschnig
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wolfhart Totschnig.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Totschnig, W. Fully Autonomous AI. Sci Eng Ethics 26, 2473–2485 (2020). https://doi.org/10.1007/s11948-020-00243-z

Download citation

Published: 28 July 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s11948-020-00243-z

Fully Autonomous AI

Abstract

Access this article

Similar content being viewed by others

Brain Intelligence: Go beyond Artificial Intelligence

The Ugly Truth About Ourselves and Our Robot Creations: The Problem of Bias and Social Inequity

Future Progress in Artificial Intelligence: A Survey of Expert Opinion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fully Autonomous AI

Abstract

Access this article

Similar content being viewed by others

Brain Intelligence: Go beyond Artificial Intelligence

The Ugly Truth About Ourselves and Our Robot Creations: The Problem of Bias and Social Inequity

Future Progress in Artificial Intelligence: A Survey of Expert Opinion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation