AI Alignment Problem: “Human Values” don’t Actually Exist
Abstract
Abstract. The main current approach to the AI safety is AI alignment, that is, the creation of AI whose preferences are aligned with “human values.” Many AI safety researchers agree that the idea of “human values” as a constant, ordered sets of preferences is at least incomplete. However, the idea that “humans have values” underlies a lot of thinking in the field; it appears again and again, sometimes popping up as an uncritically accepted truth. Thus, it deserves a thorough deconstruction, which I will do by listing and analyzing comprehensively the hidden assumptions of the idea that “humans have values.” This deconstruction of human values will be centered around the following ideas: “Human values” are useful descriptions, but not real objects; “human values” are bad predictors of behavior; the idea of a “human value system” has flaws; “human values” are not good by default; and human values cannot be separated from human minds. The method of analysis is listing hidden assumptions on which the idea of “human values” is built. I recommend that either the idea of “human values” should be replaced with something better for the goal of AI safety, or at least be used very cautiously. The approaches to AI safety which don’t use the idea of human values at all may require more attention, like the use of full brain models, boxing, and capability limiting.Author's Profile
My notes
Similar books and articles
Institute on Human Values in Medicine Human Values Teaching Programs for Health Professionals.Lorraine L. Hunt, Edmund D. Pellegrino, Institute of Human Values in Medicine & Society for Health and Human Values - 1974 - Society for Health and Human Values.
Human Values, Moral Values, and Spiritual Values: A Book on Divine Values for the Coming Golden Age.Jagdish Chander & K. B. - 1980 - Prajapita Brahma Kumaris Ishwariya Vishwa-Vidyalaya.
Human Values in Disposing the Dead: An Inquiry into Cremation Technology.Vishwambhar Nath Prajapati & Saradindu Bhaduri - 2019 - Journal of Human Values 25 (1):52-65.
Human Values in Healthcare Ethics Introduction Many Voices: Human Values in Healthcare Ethics.K. W. M. Fulford, D. Dickenson & T. H. Murray - 2002
Infusing Advanced AGIs with Human-Like Value Systems: Two Theses.Ben Goertzel - 2016 - Journal of Evolution and Technology 26 (1):50-72.
Convergence and Divergence of Ethical Values across Nations: A Framework for Managerial Action.Samir Ranjan Chatterjee & Ratan Tata - 1998 - Journal of Human Values 4 (1):5-23.
The Mystery of Values: Studies in Axiology.Ludwig Grünberg, Cornelia Grünberg & Laura Grünberg (eds.) - 2000 - Rodopi.
On the Universality of Values.Ryszard Stefański - 2009 - Dialogue and Universalism 19 (6-7):155-160.
Human Values and HRM Practice: The Japanese Shukko System.Richard J. Grainger & Tadayuki Miyamoto - 2003 - Journal of Human Values 9 (2):105-115.
Personal Construct Theory and Human Values.James Horley - 2012 - Journal of Human Values 18 (2):161-171.
Analytics
Added to PP
2019-04-22
Downloads
521 (#19,518)
6 months
103 (#7,966)
2019-04-22
Downloads
521 (#19,518)
6 months
103 (#7,966)
Historical graph of downloads