AI Alignment Problem: “Human Values” don’t Actually Exist


Abstract. The main current approach to the AI safety is AI alignment, that is, the creation of AI whose preferences are aligned with “human values.” Many AI safety researchers agree that the idea of “human values” as a constant, ordered sets of preferences is at least incomplete. However, the idea that “humans have values” underlies a lot of thinking in the field; it appears again and again, sometimes popping up as an uncritically accepted truth. Thus, it deserves a thorough deconstruction, which I will do by listing and analyzing comprehensively the hidden assumptions of the idea that “humans have values.” This deconstruction of human values will be centered around the following ideas: “Human values” are useful descriptions, but not real objects; “human values” are bad predictors of behavior; the idea of a “human value system” has flaws; “human values” are not good by default; and human values cannot be separated from human minds. The method of analysis is listing hidden assumptions on which the idea of “human values” is built. I recommend that either the idea of “human values” should be replaced with something better for the goal of AI safety, or at least be used very cautiously. The approaches to AI safety which don’t use the idea of human values at all may require more attention, like the use of full brain models, boxing, and capability limiting.



External links

  • This entry has no external links. Add one.
Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

  • Only published works are available at libraries.

Similar books and articles

Human Values in Management.R. K. Dasgupta - 1997 - Journal of Human Values 3 (2):145-160.
Human values, moral values, and spiritual values: a book on divine values for the coming golden age.Jagdish Chander & K. B. - 1980 - Mount Abu, India: Prajapita Brahma Kumaris Ishwariya Vishwa-Vidyalaya.
Infusing Advanced AGIs with Human-Like Value Systems: Two Theses.Ben Goertzel - 2016 - Journal of Evolution and Technology 26 (1):50-72.
On the Universality of Values.Ryszard Stefański - 2009 - Dialogue and Universalism 19 (6-7):155-160.
Management by Human Values: An Overview.Abad Ahmad - 1999 - Journal of Human Values 5 (1):15-23.
Human values and verities.Henry Osborn Taylor - 1928 - London,: Macmillan & Co..
Personal Construct Theory and Human Values.James Horley - 2012 - Journal of Human Values 18 (2):161-171.


Added to PP

1,006 (#11,485)

6 months
243 (#7,441)

Historical graph of downloads
How can I increase my downloads?

Author's Profile

References found in this work

No references found.

Add more references