Abstract
Given actual autonomous systems with capacities for harm and the public’s apparent willingness to take moral advice from large language models (LLMs), Einar Duenger Bohn’s (2024) renewed discussion of the Moral Turing Test (MTT) is timely. Bohn’s aim is to defend an unequivocally behavioural test. In this paper, I argue against this direction. Interpreted as testing mere behaviour, the Turing test is a poor test of either intelligence or moral agency, and neither Bohn’s version of the test nor Allen, Varner and Zinser’s influential (2000) version avoids these problems. Also, the MTT’s advantages as advertised by Bohn do not hold up: it is an open empirical question whether embracing a merely behavioural MTT will significantly reduce the challenges in building a computer to pass the test. This issue aside, I argue that Turing’s actual test is superior to current AI benchmarks and to Bohn’s version of the test. A test of moral reasoning or agency in machines that is based on Turing’s actual test has advantages over current tests of moral reasoning in AI, including versions of the MTT. Such a test is an intriguing possibility still to be investigated.