Skip to main content
Log in

Existentialist risk and value misalignment

  • Published:
Philosophical Studies Aims and scope Submit manuscript

Abstract

We argue that two long-term goals of AI research stand in tension with one another. The first involves creating AI that is safe, where this is understood as solving the problem of value alignment. The second involves creating artificial general intelligence, meaning AI that operates at or beyond human capacity across all or many intellectual domains. Our argument focuses on the human capacity to make what we call “existential choices”, choices that transform who we are as persons, including transforming what we most deeply value or desire. It is a capacity for a kind of value misalignment, in that the values held prior to making such choices can be significantly different from (misaligned with) the values held after making them. Because of the connection to existentialist philosophers who highlight these choices, we call the resulting form of risk “existentialist risk.” It is, roughly, the risk that results from AI taking an active role in authoring its own values rather than passively going along with the values given to it. On our view, human-like intelligence requires a human-like capacity for value misalignment, which is in tension with the possibility of guaranteeing value alignment between AI and humans.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Although our discussion is very much influenced by Paul (2014), we break from her view that existential choices (our term) or transformative choices (her term) can be rational when they are motivated by self-discovery (Paul, 2014: 120); for critical discussion of the issues involved, see (Tubert, 2023). That said, there is room for a more Paul-driven account of existentialist risk than what we explore in the text, perhaps one that seizes on Paul’s (2014: 10) notion of epistemic transformation to argue that we cannot know what choices an AI system might make in transforming its values. Thanks to an anonymous referee here.

  2. Russell (2019: 320, n. 40) also briefly engages Paul and Richard Pettigrew on the topic.

  3. Partly for reasons of computational intractability (van Rooij et al., 2019) no system can achieve perfect rationality, and instead the most we should hope for is something along the lines of “bounded rationality” (Simon, 1958; Russell, 1997). Still, we can suppose that machines might come closer to perfect rationality than human beings do, and in connection that they do not demonstrate the cognitive biases mentioned in the body of the text.

  4. Bostrom’s view about orthogonality stands in prima facie tension with Kantian views of practical rationality that say that a fully rational agent, recognizing the moral law or the overriding and intrinsic value of humanity, will not sacrifice human beings for the sake of paperclips. Bostrom (2014: 130) responds to this tension by distinguishing intelligence from rationality and reason: perhaps a machine pursuing paths at odds with the moral law or the value of humanity would thereby be lacking in rationality and reason, but it still could be fully intelligent by demonstrating instrumental rationality. That is, there may be conceptions of reason or rationality at odds with sacrificing human beings for paperclips but there is a further question as to whether the notion of “intelligence” requires such conceptions. Bostrom’s point about goal-content integrity, our primary interest here, does not require an answer to this further question even though other aspects of his argument, including his point about existential risk, may so require.

  5. Perhaps part of the mechanism by which the paperclip maximizer locks in its values is that it refuses ever to engage in such contemplation.

  6. Autonomy in this sense is not at odds with having inherited values from society or evolution but it is at odds with not being able to reject or change those values.

  7. Along broadly similar lines, Oaksford & Hall (2016) advance a contrarian argument saying that within human beings, the unconscious, fast, heuristic-driven, and phylogenetically older “System 1” is better at complying with decision-theoretic principles than is the conscious, slower, analytic “System 2” that coevolved with the human capacity for language.

  8. Gigerenzer presses the point in order to criticize approaches in psychology that take decision theory and aligned fields to provide the appropriate normative standards for assessing human rationality—he is criticizing Kahneman and Tversky’s heuristics and biases program. We will not directly engage with his argument on this point however.

  9. Stanovich takes the term from Taylor (1989) and Flanagan (1996).

Works cited

  • Arkes, H. R., Ayton, P. (1999). The sunk cost and concorde effects: Are humans less rational than lower animals? Psychological Bulletin, 125(5), 591–600.

    Article  Google Scholar 

  • Bales, A., & Kirk-Giannini, C. D. (2024). Artificial intelligence: Arguments for catastrophic risk. Philosophy Compass, e12964. https://doi.org/10.1111/phc3.12964

  • Bostrom, N. (2002). Existential risks: Analyzing human extinction scenarios and related hazards. Journal of Evolution and Technology, 9.

  • Bostrom, N. (2014). Superintelligence: Paths, dangers, strategies. Oxford University Press.

  • Callard, A. (2018). Aspiration: The Agency of becoming. Oxford University Press.

  • Carlsmith, J. (2021). Is Power-Seeking AI an Existential Risk, arXiv:2206.13353, 2021. URL = https://arxiv.org/pdf/2206.13353.pdf.

  • Chalmers, D. J. (2010). The singularity: A philosophical analysis. Journal of Consciousness Studies, 17(9–10), 7–65.

    Google Scholar 

  • Chang, R. (2017). Hard choices. Journal of the American Philosophical Association, 3(1), 1–21.

    Article  Google Scholar 

  • Christian, B. (2020). The Alignment Problem: Machine learning and human values. Norton.

  • Flanagan, O. (1996). Self expressions: Mind, morals, and the meaning of life. Oxford University Press.

  • Gabriel, I. (2020). Artificial Intelligence, values, and Alignment. Minds and Machines, 30(3), 411–437.

    Article  Google Scholar 

  • Gigerenzer, G. (1994). Why the Distinction Between Single-Event Probabilities and Frequencies is Important for Psychology (and Vice Versa), in George Wright & Peter Ayton (Eds.), Subjective Probability. John Wiley & Sons: 129–161.

  • Good, I. J. (1965). Speculations Concerning the First Ultraintelligent Machine, in F. Alt and M. Rubinoff (Eds.), Advances in Computers, Volume 6: 31–88.

  • Hendrycks, D., Mazeika, M., & Woodside, T. (2023). “An overview of catastrophic AI risks,” arXiv:2306.12001. URL = https://arxiv.org/abs/2306.12001

  • Kagel, J. H. (1987). Economics According to the Rats (and Pigeons too): What Have We Learned and what We Hope to Learn, in Alvin E. Roth (Ed.), Laboratory Experimentation in Economics: Six Points of View. Cambridge University Press: 155–192.

  • Kahneman, D. (2011). Thinking, fast and slow. Straus and Giroux.

  • Kahneman, D., Slovic, P., & Tversky, A. (Eds.). (1982). Judgment under uncertainty: Heuristics and biases. Cambridge University Press.

  • Kant, I. 1785/2012. Groundwork of the Metaphysics of morals. Cambridge University Press.

  • Kurzweil, R. (2005). The Singularity is Near: When Humans Transcend Biology. Viking.

  • Lambert, E., & Schwenkler, J. (Eds.). (2020). Becoming someone New: Essays on transformative experience, choice, and change. Oxford University Press.

  • MacAskill, W. (2022). What we owe the future. Basic Books.

  • McCarthy, J. (2007). From Here to human-level AI. Artificial Intelligence, 171(18), 1174–1182.

    Article  Google Scholar 

  • Müller, V. C., & Cannon, M. (2021). Existential risk from AI and orthogonality: Can we have it both ways? Ratio, 35(1), 25–36.

    Article  Google Scholar 

  • Oaksford, M., & Hall, S. (2016). On the source of human irrationality. Trends in Cognitive Sciences, 20(5), 336–344.

    Article  Google Scholar 

  • Omohundro, S. M. (2008). The Basic AI Drives, in Pei Wang, Ben Goertzel, & Stan Franklin (Eds.), Artificial General Intelligence: Proceedings of the First AGI Conference. Frontiers in Artificial Intelligence and Applications 171: 483–492.

  • Ord, T. (2020). The precipice: Existential risk and the future of humanity. Hachette Books.

  • Paul, L. A. (2014). Transformative experience. Oxford University Press.

  • Paul, L. A. (2015). What you can’t expect when you’re expecting. Res Philosophica, 92(2), 1–23.

    Article  Google Scholar 

  • Pettigrew, R. (2019). Choosing for changing selves. Oxford University Press.

  • Russell, S. (1997). Rationality and intelligence. Artificial Intelligence, 94(1–2), 57–77.

    Article  Google Scholar 

  • Russell, S. (2019). Human compatible: Artificial Intelligence and the Problem of Control. Viking.

  • Russell, S. (2020). Artificial Intelligence: A Binary Approach, in Liao (2020): 327–341.

  • Russell, S., & Peter Norvig. (2009). Artificial intelligence: A modern approach, 3rd edn. Prentice Hall.

  • Russell, S., Dietterich, T., Horvitz, E., Selman, B., Rossi, F., Hassabis, D., Legg, S., Suleyman, M., George, D., & Phoenix, S. (2015). Research priorities for robust and beneficial artificial intelligence: An open letter. AI Magazine, 36.4.

  • Sartre, J. P. (1946/2007). Existentialism is a Humanism. Translated by Carol Macomber. Yale University Press.

  • Satz, D., & Ferejohn, J. (1994). Rational choice and social theory. Journal of Philosophy, 91(2), 71–87.

    Article  Google Scholar 

  • Savage, L. J. (1954). The foundations of statistics. Wiley.

  • Schuck-Paim, C., & Kacelnik, A. (2002). Rationality in risk-sensitive foraging choices by starlings. Animal Behavior, 64(6), 869–879.

    Article  Google Scholar 

  • Schwitzgebel, E., & Garza, M. (2020). Designing AI with rights, consciousness, self-respect, and freedom. In S. Matthew, & Liao (Eds.), Ethics of Artificial Intelligence. Oxford University Press.

  • Shevlin, H., Vold, K., & Halina, M. (2019). The limits of machine intelligence. EMBO Reports, 20(10). https://doi.org/10.15252/embr.201949177

  • Simon, H. A. (1958). Rational choice and the structure of the Environment. in Models of bounded rationality (Vol. 2). MIT Press.

  • Stanovich, K. E. (2004). The Robot’s Rebellion: Finding meaning in the age of Darwin. University of Chicago Press.

  • Stanovich, K. E. (2009). What intelligence tests Miss: The psychology of rational thought. Yale University Press.

  • Stanovich, K. E. (2012). On the distinction between rationality and intelligence: Implications for understanding individual differences in reasoning. In K. J. Holyoak, & R. G. Morrison (Eds.), The Oxford Handbook of thinking and reasoning (pp. 433–455). Oxford University Press.

  • Stanovich, K. E. (2013). Why humans are (sometimes) less rational than other animals: Cognitive complexity and the axioms of Rational Choice. Thinking & Reasoning, 19(1), 1–26.

    Article  Google Scholar 

  • Stanovich, K. E. (2021). The Bias that divides us: The Science and Politics of Myside thinking. MIT Press.

  • Taylor, C. (1989). Sources of the self: The making of modern identity. Harvard University Press.

  • Tubert, A. (2023). Existential choices and practical reasoning. Inquiry: An Interdisciplinary Journal of Philosophy. https://doi.org/10.1080/0020174X.2023.2235387.

    Article  Google Scholar 

  • Turing, A. (1950). Computing Machinery and Intelligence. Mind, 59(236), 433–460.

    Article  Google Scholar 

  • Tversky, A. (1969). Intransitivity of preferences. Psychological Review, 76(1), 31–48.

    Article  Google Scholar 

  • Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185, 1124–1131.

    Article  Google Scholar 

  • Ullmann-Margalit, E. (2006). Big decisions: Opting, converting. Drifting Royal Institute of Philosophy Supplement, 58: 157–172.

  • Van Rooij, Iris, M., Blokpoel, J., Kwisthout, & Wareham, T. (2019). Cognition and intractability: A guide to classical and parameterized complexity analysis. Cambridge University Press.

  • Vold, K., & Harris, D. (2021). How does artificial intelligence pose an existential risk? In Carisa Véliz (Ed.), Oxford Handbook of Digital Ethics. Oxford University Press.

  • Von Neumann, J., & Morgenstern, O. (1944). Theory of games and economic behavior. Princeton University Press.

  • Wallach, W., & Vallor, S. (2020). &. Moral machines: From value alignment to embodied virtue, in Liao (2020): 383–412.

Download references

Acknowledgements

We want to thank the National Endowment for the Humanities, the University of Puget Sound, and the John Lantz Senior Fellowship for Research or Advanced Study for support for our work and the audience at the Math & Computer Science Seminar at the University of Puget Sound, the participants at the Philosophy, AI, and Society Workshop at Stanford University, and anonymous referees for helpful feedback.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ariela Tubert.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tubert, A., Tiehen, J. Existentialist risk and value misalignment. Philos Stud (2024). https://doi.org/10.1007/s11098-024-02142-6

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11098-024-02142-6

Keywords

Navigation