1 Two Problems of Trust

There are two problems of trust in the domain of AI research: a theoretical problem and a practical one. According to the theoretical problem, how are we to theorize about trust in an AI-relevant sense? The theoretical problem of trust is about whether an AI-relevant theory of trust is feasible or articulable. According to the practical problem, how ought we to design and engineer our AI systems to address any trust-relevant worries and concerns? In Section 2, I will characterize trust as a quaternary relation and identify an accompanying set of conditions to be satisfied before we have sufficient grounds for believing that a relation of trust exists between the trustor and the trustee. In Section 3, I will outline the positions of a number of skeptics vis-à-vis AI-relevant theories of trust: the reliance-without-trust skeptic, the no-intentionality skeptic, the value-neutrality skeptic, and the distrust-as-default-position skeptic. In Section 4, I will address the skeptical challenges that have been outlined in Section 3 and propose an account of intentionality and related conceptual manoeuvres (e.g. the overruling of the fact-value distinction) to defuse the force of various skeptical challenges. In Section 5, I will highlight a fifth skeptical challenge that I shall term the ‘scope challenge’. In Section 6, I will propose what I shall term a ‘trust-engineering approach’ to address the practical problem of trust. My attempts to address both the theoretical problem of trust and the practical problem of trust will culminate in a discussion of how the reward function may be constructed to encourage reinforcement learning-based agents to respond in a trust-responsive fashion, which I take to be a novel contribution to both the philosophy of AI and AI research.

2 Trust as a Quaternary Relation and Conditions for Satisfaction

Before we discuss about the practical problem of how we might increase the level of trust in human-AI interactions, we must first address the extant skepticism about AI-relevant theories of trust. According to Baier (1986), trust is a phenomenon with which we are so familiar that we scarcely notice its presence and its variety, whether in the form of our putting our bodily safety into the hands of pilots, drivers, or doctors, refraining from suspecting that the food we purchase may be deliberately poisoned, or trusting that our children will be fine in their day-care centres. Trust is also ubiquitous in civil society: we trust both individuals who are demonstrably trustworthy (loyal, virtuous, and prudent) and complete strangers (Pettit, 1995). AI is a similarly ubiquitous phenomenon: alongside such other emerging technologies as nanotechnology, quantum computing, and biotechnology, AI is the poster child for the Fourth Industrial Revolution and the use of AI systems across a multiplicity of domains is becoming more rather than less widespread (Schwab, 2017). Given the ubiquity of both trust and AI, we should have good reason to expect the untrammelled development of AI-relevant theories of trust. What plausible grounds for skepticism might exist with respect to AI-relevant theories of trust?

I shall assume from the outset and without further argument that trust is a quaternary relation R(A, B, ϕ, G) consisting of four relata: the trustor A, a trustee B, some action ϕ to be performed, and a goal G that makes the performance of ϕ desirable (see Fig. 1).Footnote 1 More specifically, trust is a mental state that A holds toward B with respect to the performance of ϕ relevant for the goal G (Castelfranchi & Falcone, 1998). A theory of trust, taking this quaternary relation as basic for conceptual analysis, typically identifies the conditions that must be satisfied before we have sufficient grounds for believing that there is a relation of trust between A and B with respect to a G-relevant ϕ. Traditional approaches in the philosophy of trust have taken both A and B to be persons and concerned themselves with an analysis of interpersonal trust. The AI-relevant theory of trust with which I am concerned identifies B as an artificial agent and ϕ as an anticipated action, event, or decision-outcome that is brought about by B.

Fig. 1
figure 1

Trust as a quaternary relation (degree = 4)

Some theories of trust maintain that only one condition has to be satisfied before a relation of trust obtains between A and B: a probability threshold condition. Trust is grounded in probabilities that A attributes to her own beliefs about the behaviour and competences of B with respect to the G-relevant ϕ. On a probability distribution with the parameter p ∈ [0,1], 1 denotes complete trust, 0 denotes complete distrust, and 0.50 denotes uncertainty. Let n denote the probability threshold value and let m denote the probability value that A attributes to her trust-relevant beliefs. If mn, then the sole condition will be satisfied and there will be a trust relation between A and B (Gambetta, 1998). These theories of trust characterize trust in terms of mere reliance. Reliability (regularly designated by R) has been defined by engineers as probability that the item can perform a required function under given conditions for a stated time interval (Birolini, 2013, p. 334).Footnote 2 If both A and B are persons and our object of conceptual analysis is interpersonal trust, then B figures more or less as an operator who is expected to perform a G-relevant action ϕ. B is said to be reliable if she can perform ϕ under given conditions for a stated time interval. Alternatively, it will be urged by these theorists that A can trust B with respect to the G-relevant ϕ for the stated time interval. Furthermore, it may be suggested that an AI-relevant theory of trust that is consistent with these theoretical foundations will claim the following: A trusts B, where B is an artificial agent, to a degree m iff A is willing to risk the use of B on the basis that it will perform the G-relevant ϕ with probability m, where mn.Footnote 3 Following Nickel et al. (2010), we may term this candidate AI-relevant theory of trust a ‘modified pure rational-choice account’.

3 Four Skeptical Challenges to AI-Relevant Theories of Trust

The modified pure rational-choice account makes an important connection between trust and contexts where risk is present. In addition, there is no fundamental difference between a person being reliable and an AI system being reliable. According to the first skeptical challenge (‘reliance-without-trust’), however, a conceptual distinction could be still made between trust and mere reliance. The probability threshold condition turns out to be insufficient: AI systems could satisfy the probability threshold condition (i.e. mn), yet we merely rely on rather than trust these systems. Whereas misplaced reliance results in disappointment, misplaced trust tends to lead to feelings of resentment and betrayal toward the trustee when let down (Hieronymi, 2008; Wanderer and Townsend, 2013). Trusting, it may be conjectured, is not an attitude that I can adopt toward machines: I may feel disappointed when my computer fails to save important documents as it should, although it would seem strange for me to feel betrayed by or resentful of this malfunctioning computer (Jones, 1996). A necessary condition of trust that has been omitted by the modified pure rational-choice account is the normative expectation condition. A normative expectation is in place when A relies on B to do what B should do, and A does not only expect that B will do it but also expects it of B (Walker, 2006, p. 79). Insofar that A has a normative expectation that B will perform ϕ, A’s normative expectation brings with it a requirement of responsibility on the part of B. If the normative expectation condition is fulfilled and B is responsible for the G-relevant ϕ that A normatively expects her to perform, then A’s feelings of resentment and betrayal toward B when B fails to perform ϕ may be explained in terms of B’s dereliction of responsibility.

In addition, some theories of trust maintain of the trustee B that it should possess the relevant goodwill. Since only things that have wills can have goodwill, it follows that we can only trust agents that have wills. According to the second skeptical challenge (‘no-intentionality’), given that many open questions remain about the concepts of volition, autonomy, rationality, moral responsibility, intentionality, and consciousness and whether AI systems possess them, we should suspend judgment about whether an artificial agent can qualify as a trustee B (Johnson, 2006; Himma, 2009). The no-intentionality skeptic might rely on Ullmann-Margalit (2004), for whom the trust relation is characterized as follows: A trusts B iff A believes that B has the appropriate intentions toward her (primary component) and B has the appropriate competence with respect to the G-relevant ϕ (secondary component).Footnote 4 The primacy of the intentionality requirement allows the no-intentionality skeptic to distinguish between instances of trust and instances of reliance and confidence. As AI systems lack the relevant intentionality, they can neither take into account interests that they might have, act in favour of the interests of the trustor A because they care about A or possess goodwill toward A, nor experience the possibility of a conflict of A’s interests and their own. We can neither normatively expect it of AI systems that they will do such-and-such nor predicate of AI systems that they possess wills. Therefore, the reliance-without-trust skeptic and the no-intentionality skeptic will agree that no AI-relevant theory of trust can be developed.

According to the third skeptical challenge (‘value-neutrality’), AI systems, as stand-alone implementations, are normatively neutral. This skeptical approach is undergirded by a thesis (viz. the value-neutrality thesis) that applies to AI systems in particular and technologies in general. According to this thesis, a new piece of technology T can have bad consequences only if people have vicious T-relevant preferences or if users with minimally decent preferences act out of ignorance. Conversely, T can have good consequences only if people have minimally decent T-relevant preferences or if users with vicious T-relevant preferences act out of ignorance (Morrow, 2014). AI research is concerned with understanding and building intelligent entities (Russell and Norvig, 2010). AI research, it may be argued, typically relies on mathematical facts (e.g. in linear algebra, differential calculus, mathematical logic, probability), cognitive facts (e.g. Hebb’s learning rule of synaptic reinforcement, the McCulloch-Pitts model of the neuron), and physical facts (e.g. I am at my desk, my car is at home). If we accept this characterization of AI research and the standard epistemological distinction between statements of fact (descriptive) and statements of value (normative), then we appear to have good reason to believe that AI systems are normatively neutral as stand-alone implementations. If we are to theorize about trust in an AI context, then the trustee B must be a human user of an AI system rather than the system itself and we are back in the traditional milieu of interpersonal trust.

According to the fourth skeptical challenge (‘distrust-as-default-position’), the default position with respect to AI systems ought to be one of distrust rather than trust. This account is an especially powerful one in regimes that exclude marginalized groups from trust networks, cast oppressed individuals as untrustworthy, and prevent the oppressed from having their knowledge claims trusted by both themselves and others (Fricker, 2007; Daukas, 2011). Consider how algorithms, employed by judges and parole officers to assess a criminal defendant’s likelihood of recidivism, have demonstrated a bias against certain marginalized groups (Angwin et al., 2016). Where epistemic injustices of this sort exist and may be exacerbated by the use of AI systems and technologies, we have good reason to maintain a skeptical default position of distrust rather than trust. I am sympathetic to the view that distrust, betrayal of trust, and even strategies of resistance could be more appropriate when AI systems result in the maintenance or exacerbation of epistemic injustices. The productive and protective value of distrust in the face of tyranny and injustice and the ability of distrust to temper tyranny cannot be underestimated (Krishnamurthy, 2015). When AI systems function as tools of tyrannical and unjust regimes, there is democratic value in observing a default position of distrust.Footnote 5

The reliance-without-trust skeptic argues that the probability threshold condition, while it may be satisfied by certain reliable AI systems, is insufficient: we do not trust these systems in the way that we might other human beings. The no-intentionality skeptic argues that AI systems lack the relevant intentionality to qualify as appropriate trustees. The value-neutrality skeptic argues that AI systems are normatively neutral as stand-alone implementations and cannot qualify as trustees, although their human users can. The distrust-as-default-position skeptic argues that an AI-relevant theory of distrust may be more useful than an AI-relevant theory of trust. Both the reliance-without-trust skeptic and the no-intentionality skeptic will incline toward the impossibility of developing an AI-relevant theory of trust. The value-neutrality skeptic, in addition, falls back on traditional accounts of trust that have interpersonal trust as their object of analysis. The distrust-as-default-position skeptic, aware of how AI systems might result in the maintenance or exacerbation of epistemic injustices, recommends distrust rather than trust as the default position.

4 Addressing the Skeptics

Skeptics about the possibility of developing an AI-relevant theory of trust must recognize our growing vulnerability to AI systems and the effects that they bring about. Consider our vulnerability to a GPS system misfiring, self-driving cars malfunctioning, medical diagnostic programs making erroneous diagnoses, and machine learning-based lending algorithms making flawed credit decisions against us. The more we work under the supposition that AI systems will perform requisite tasks into our plans, the more we count on them to do something.Footnote 6 While information processing systems may still have been relatively simple in the 1990s, we are in an era when these systems are increasingly complex, powerful, indispensable, and automated in their computational capacities. We increasingly count on AI systems, in the sense that we embed in our plans assumptions about what these AI systems will do in ways that leave us vulnerable if they do not.Footnote 7 Consequently, the idea (as defended by the reliance-without-trust skeptic) that we are merely relying on these AI systems seems less plausible now than it might have been in the 1990s. To be perfectly clear, risk and vulnerability do not seem to be the sole issue that matters in trust. While we are sufficiently vulnerable to tornadoes and other forms of inclement weather, we do not feel resentment toward tornadoes or other forms of inclement weather if our metereological expectations are overturned.Footnote 8 Our growing vulnerability to AI systems, combined with an account of intentionality that encompasses both human beings and certain technical artifacts, increases the argumentative burden on skeptics who pooh-pooh theorizing about trust in an AI-relevant sense as being out of place. The more vulnerable we become to instances in which our AI systems fail to act as expected, even as these AI systems have states that are directed at objects or states of affairs in the world, the more likely we are to feel betrayed.

In addition, skepticism about the possibility of theorizing about trust in an AI-relevant sense appears to be counterintuitive. While it has been claimed that there has (at least until recently) been relatively little discussion about trust in the context of AI and human-AI interactions (Ribeiro et al., 2016), an ever-growing corpus of research on trust in the context of AI suggests otherwise. A casual Google Scholar search using the Boolean search string ‘\(\sim \)trust AND (“artificial agents” OR “AAs”) AND (“AI” OR “Artificial Intelligence”)’ for articles published since 2020 yielded c. 2,540 search results.Footnote 9 Groundbreaking work on the concept of e-trust and how it is appropriate for any analysis involving artificial agents (Floridi and Sanders, 2004; Taddeo, 2009; Taddeo & Floridi, 2011) has inspired theories of trust that pertain to AI, artificial agents, robots (Buechner & Tavani, 2011; Grodzinsky et al., 2011), social robots (Coeckelbergh, 2012), and (more recently) both human-human and human-AI interactions (Ferrario et al., 2019).

A proper account of intentionality may help to defuse the force of various skeptical challenges. One need not agree with the no-intentionality skeptic that AI systems lack the relevant intentionality. If one can demonstrate how AI systems might have intentional states, then one may be more justified in having certain normative expectations with respect to these systems. A distinction between original intentionality and derived intentionality could be invoked here. It may be claimed that AI systems have derived intentionality rather than no intentionality whatsoever. When an AI system consists of a symbol manipulation system with formal elements and syntactic rules, it may demonstrate some form of intelligent behaviour (e.g. chess-playing, theorem-proving, natural language processing). However, this intentionality is derived rather than original, since the syntactic rules and formal elements cannot represent beliefs, desires, expectations, and other intentional mental states. Original intentionality, on the other hand, may be thought to refer to the intentionality of mental states: this intentionality is not derived from some more prior forms of intentionality but is intrinsic to these mental states themselves (Searle, 1983).

Furthermore, it may be argued that these AI systems borrow whatever derived intentionality that they might have from the original intentionality of human beings who design, implement, and use them for various purposes. AI technologies are tools that are designed and used in service of their respective ends, goals, and purposes. As products of human intentional action, they have a prima facie claim to some form of derived intentionality (Ihde, 1990; Latour, 1992; Verbeek, 2008). The use of the distinction between original intentionality and derived intentionality is in line with an increasing tendency to use notions such as autonomy, agency, choice-making, and morality (traditionally for describing and explaining intentional human behaviour) to describe and explain the behaviour of AI systems as products of intentional human behaviour (Nickel et al., 2010).

The no-intentionality skeptic may well concede that AI systems have derived intentionality rather than no intentionality, while maintaining that derived intentionality is irrelevant in the context of trust. After all, the distinction between derived and original intentionality is traditionally predicated on an account of intentionality that defends a conjunction of the following claims (Searle, 1980, 1984):

  1. (C1)

    No matter how sophisticated or complex it might be, an AI system is never by itself a sufficient condition of intentionality.

  2. (C2)

    Intentionality is a biological phenomenon that is causally dependent on biochemistry.

While a chess-playing program is capable of demonstrating intelligent behaviour through the rule-governed manipulation of formally specified elements, C2 implies that intentionality must still be traced to the designers, programmers, and even users of this chess-playing program and imputed to the causal powers of their brains.Footnote 10 We may have plausible reservations about the account of intentionality that is given by the conjunction (C1 ∧ C2). While an AI system might consist solely of a symbol manipulation system with formal elements and syntactic rules, it could also comprise additionally of a set of transducers (tying the computational system in some sense to the outside world) and an etiology or context (the environment in which the computation-cum-transducer system finds itself). Given the right input and the right history and context, it has been argued that an embodied computational system could have (rudimentary) intentional states, insofar that this system’s states are about or directed at objects and states of affairs in the world (Bynum, 1985). Even in the absence of human beings, artificial agents may continue interacting with each other and their environment and maintaining these (rudimentary) intentional states, giving the lie to C1 and C2.Footnote 11

A few more things could be said in response to the value-neutrality skeptic and the distrust-as-default-position skeptic. The value-neutrality skeptic relies from the outset on the standard epistemological distinction between fact and value. However, this fact-value distinction is an artificial one and ignores the leaky nature of the fact/value divide: facts appear to seep into values even as we observe a leakage from values to facts. In addition, AI technologies constitute more than merely applied science. The goal-directed and purposive nature of AI systems, as aforementioned in the claim in favour of their derived intentionality, puts paid to the idea that they are normatively neutral. It may also be advanced against the distrust-as-default-position skeptic that she is relying on certain assumptions about the nature of the regime in which AI systems are implemented. However, not all regimes are characterized by tyranny and injustice, AI systems could be designed as tools to aid marginalized groups, and even regimes given to tyranny and injustice could well use AI systems for other more mundane purposes than the maintenance and exacerbation of epistemic injustices. There are at least some instances in which we ought to avoid excessive trust with respect to regimes that are supported by oppressive and exclusive norms. Where these regimes are supported by AI systems (e.g. speech recognition technologies, mass surveillance technologies), distrust of these systems would be favourable as a default position and even healthy. However, for other more ideal moral climates in which people enjoy their freedoms, trust flourishes, and AI systems are used as tools for promoting societal ends, distrust may be less relevant as a default position.Footnote 12

5 The Scope Challenge

Notwithstanding the skeptical challenges from the reliance-without-trust skeptic, the no-intentionality skeptic, the value-neutrality skeptic, and the distrust-as-default-position skeptic (Section 3), attempts to address the theoretical problem and articulate an AI-relevant theory of trust have not been in short supply. In this section, I will identify a fifth skeptical challenge that lies in wait. I shall term this latter challenge the ‘scope challenge’. In Section 2, I advanced that trust is a quaternary relation R(A, B, ϕ, G) and further that a set of accompanying conditions must be satisfied if a relation of trust is to exist between A and B with respect to a G-relevant ϕ. Additionally, a number of plausible conditions (viz. the probability threshold condition, the normative expectation condition) were identified in Sections 2 and 3. Suppose that the force of the four skeptical challenges is sufficiently defused and the prospects for an AI-relevant theory of trust are renewed. Suppose that an attempt is made to identify the full suite of conditions that are necessary and sufficient for the relation between the trustor A and the trustee B to count as a trust relation. Last but not least and in accordance with my delineation of an AI-relevant theory of trust, suppose that B is identified as an artificial agent and ϕ is identified as an anticipated action, event, or decision-outcome that is brought about by B. Is the AI-relevant theory of trust or collection of theories representationally adequate to the multifarious forms of trust and AI or not? This is the scope challenge in the interrogative mood.

Trust exists in a spectrum and ranges from full-fledged trust, through substantial trust, therapeutic trust, trust in the face of doubt or entrusting, and agnosticism between trust and distrust to distrust itself.Footnote 13 In a similarly multifarious vein, AI systems have been developed from a variety of approaches to AI research: the laws of thought (thinking rationally) approach, the cognitive modelling (thinking humanly) approach, the Turing test (acting humanly) approach, and the reinforcement learning-based (acting rationally) approach (Russell and Norvig, 2010; Bringsjord & Govindarajulu, 2020).Footnote 14

An AI-relevant theory of trust that acknowledges the force of the scope challenge and seeks to address it in a head-on fashion will strive to represent the nature of the relationship between our notions of trust and AI in a manner that is representationally adequate to both notions and consistent across their multifarious realizations.Footnote 15 Such a theory or collection of theories will recognize the complex phenomenology of trust and engage seriously with the possibility of an AI-relevant account of trust that extends over its diverse manifestations. The philosophy of trust distinguishes between the disappointment that arises from misplaced reliance and the betrayal and resentment that arise from misplaced trust. Feelings of betrayal and resentment belong, as other feelings such as gratitude and moral anger, to a class of attitudes that we take toward agents rather than objects. This class of attitudes has been termed the ‘reactive attitudes’ and is intimately bound up with ascriptions of responsibility (Strawson, 1962). These feelings of betrayal and resentment are reactive attitudes that link trust to practices of holding agents responsible for their actions from the participant stance. The explanatory advantages of a theory of trust that relies on the Strawsonian reactive attitudes have been extensively catalogued by philosophers of trust (Holton, 1994; Jones, 2004; Walker, 2006; Hieronymi, 2008; McGeer, 2008). Following Nickel et al. (2010), I shall term a candidate AI-relevant theory of trust that is developed along these lines a ‘modified motivation-attributing account’.Footnote 16 The AI-relevant modified motivation-attributing account will provide sufficient room for the Strawsonian reactive attitudes and explain why A is willing, in certain circumstances, to count on B and make herself vulnerable. Where certain essential attributes of garden-variety trust in an interpersonal context (viz. intentionality, normative expectations) will be difficult or impossible to locate when B is identified as an artificial agent, conceptual clarification or related conceptual manoeuvres (as suggested in Section 4) may have to be effected. Environmental factors (e.g. an inclusive climate of trust as opposed to an oppressive climate of distrust), the domain under consideration, and the consequences of trusting or distrusting are among the variables that determine the appropriate default stance to adopt within the spectrum that extends from full-fledged trust to distrust.

In the second instance, we must recognize the variety of approaches to AI research and the variety of systems to which these distinct approaches give rise, while engaging seriously with the possibility of a an AI-relevant theory or collection of theories that extends over its diverse manifestations. Whatever intelligence may ultimately be, AI is constituted by artificial entities capable of simulating intelligence or exhibiting certain intelligence-relevant mental traits, often by performing tasks normally thought to require intelligence. These tasks include but are not limited to the following: the successful navigation in a physical space, knowledge representation and reasoning, learning and generalization, image recognition, problem-solving, inference-making, and natural language processing and understanding. AI systems are sufficiently advanced technologies that have been designed for a range of tasks, the successful performance of which tends to conduce to certain ends or purposes that matter to us. Certain expressions in the English language suggest a natural connection between trust and care: we say both that the trustor normally entrusts the trustee with something she cares about and that the trustor entrusts that cared-for something to the care of the trustee.

AI systems have been designed and introduced because there are certain things such as the goal G that we care about (i.e. certain things that matter to us), specific tasks such as ϕ the successful performance of which will tend to promote these things that we care about, a recognition that there are limits to our agential powers, and the hopeful trust that tapping on the resources and competences of sufficiently advanced technologies will promote the successful performance of these tasks and help us along our way to G. The point about the trustor A (presumably a human being) accepting limitations on her own agency is especially crucial: after all, if A could easily and directly bring about some desired end, then hoping for that end would be out of place, since A would simply act so as to achieve it (McGeer, 2008). A’s hope that B will perform the desirable and G-relevant action ϕ and A’s coming to terms with the limitations of her own agency go hand-in-hand here.

As AI systems progress through the design, testing, implementation, and use phases, various stakeholders (viz. big tech companies, AI researchers, industry partners, organizations, users, etc.) will become involved in differing capacities. This collective will of the relevant stakeholders could be tapped on in the ethics and epistemology of trust: we simply need to ensure that we have what generally passes muster as goodwill across these various stakeholders. Whereas the value-neutrality skeptic is only prepared to consider human users of AI systems as candidate trustees, the implication here is that a larger group of human stakeholders will have to be countenanced. Perhaps it may be more appropriate to characterize the trustee B in the quaternary trust relation R(A, B, ϕ, G) as an AI ecosystem rather than an AI system.Footnote 17 In addition, it has been hypothesized that a set of drives may be identified across a broad class of AI systems and will emerge as a result of convergent paths of AI development (Omohundro, 2008; Bostrom, 2014). These drives include the drive to self-improvement, the drive to greater rationality, the drive to self-preservation, and the drive to resource acquisition and the efficient use of these resources. These basic AI drives are analogues of the human will, render AI systems purposive and goal-driven, and enhance their agential aspects. We should remind ourselves of the collective will of the relevant stakeholders and the basic AI drives, whenever skeptical questions are raised about volition, autonomy, rationality, moral responsibility, intentionality, and consciousness in the context of AI systems.

6 Trust-Engineering

The practical problem concerns how we should design and engineer our AI systems to address any worries and concerns about their trustworthiness.Footnote 18 Although trust and trustworthiness are interlocking, they are also categorically distinct (Hardin, 2006; Nickel et al., 2010). Trust is a mental state that A holds toward B with respect to the performance of a G-relevant ϕ. Trustworthiness, on the other hand, is a quality in B that satisfies this mental state and helps to make it appropriate. Trustworthiness is an epistemic virtue that can be cultivated in human beings. Philosophers of trust typically defend trustworthiness as an epistemic virtue and characterize the bearer of this virtue as someone who can be counted on to avoid unduly violating the normative expectations that others rely upon her to meet (Frost-Arnold, 2014). Trustworthiness is also associated with other laudable traits such as loyalty, virtue, and prudence (Pettit, 1995). An individual who develops her dispositions to be loyal, virtuous, or prudent will cultivate her trustworthiness. It is not obvious how AI systems can be designed to develop their dispositions to be loyal, virtuous, and prudent, in a manner analogous to human beings. Given this important disanalogy between human beings and AI systems, we appear to have grounds to remain skeptical about whether the practical problem can be solved.Footnote 19

We may arrest this skepticism by acknowledging a position that is intermediate between the modified pure rational-choice account (Section 2) and an account that gives rise to trustworthy AI. Let us call this middle-of-the-road position ‘trust-engineering’. According to this account, while trustworthiness (viz. the possession of certain laudable traits) is a form of trust-reliability, it is not its only form. There is also trust-responsiveness, which is a disposition to prove reliable under the trust of others. Robust and reliable AI systems are apt to demonstrate competence in the domain wherein they are being counted on to perform particular tasks. Trust-engineers could therefore work at improving the robustness and reliability of these AI systems (e.g. with respect to out-of-distribution test examples, adversarial perturbations, absent supervisors) (Amodei et al., 2016). Beyond the mere fact of competence, it should be recognized that trust-responsiveness is a disposition to take the fact that another is counting on one to be a positive reason to act in a trustworthy manner, absent other independent reasons to be trustworthy.Footnote 20 The idea is that the presence of trust itself generates a reason to be trustworthy. Such responsiveness to trust requires a conscious awareness that one is being counted on. Trust-engineers should therefore also work at developing AI systems with the AI-encompassing account of intentionality in mind and steadfastly refrain from conflating trust-responsiveness with mere competence.Footnote 21

One candidate trust-engineering approach may rely on certain physiognomic biases. To be clear, physiognomy (or the assessment of an individual’s character or personality from her outer appearance and especially her facial features) is a junk science. However, there appears to be a number of physiognomic biases that relate to trustworthiness. Faces with low inner eyebrows, shallow cheekbones, and thin chins are perceived as untrustworthy, individuals who are attractive or appear happy (with smiling being used as a signal of the intention to cooperate) tend to be viewed as trustworthy, faces that resemble a baby are viewed as non-threatening, and we tend to trust people who appear similar to our tribe and distrust others who appear dissimilar (Scharlemann et al., 2001; Todorov et al., 2008; Sofer et al., 2017).

The ability that AI designers have to alter the outer appearance of AI systems in line with these physiognomic biases may be contrasted with the general inability that human beings have to alter their outer appearance (expect by plastic surgery or other extreme means). From an understanding of how these physiognomic biases work, a blueprint for designing the outer appearance of AI systems could be developed, such that the chances for these systems to be perceived as trustworthy are maximized.Footnote 22

This candidate approach to trust-engineering poses serious problems. When the outer appearance of AI systems is altered in certain ways, trust may be evoked in the trustor, albeit through a mechanism that, due to its biased and irrational nature, does not in fact contribute to making the trustee more trustworthy. Where approaches to trust-engineering manipulate weaknesses in the human condition to engender trust, it makes perfect sense to ask: should such systems be developed in the first place? A far better mechanism on which to rely than a set of physiognomic biases is a psychological mechanism that has been identified in Pettit (1995).Footnote 23 Trustors rely on this mechanism of an esteem-seeking desire for the good opinion of others to get trustees to act in a trust-responsive fashion. This mechanism works because it is a fact of human psychology that people seek esteem and generally wish to be well thought of by others (Pettit, 1995). Broadly, desiring the esteem of the trustor A promotes trust, while at the same time promoting trustworthiness insofar as this esteem can be reliably maintained if the trustee B is also motivated to be trustworthy.Footnote 24

Another candidate approach to trust-engineering that is more in line with Pettit’s theory could involve working the esteem of others into the reward function of our AI systems. More specifically, if the esteem of others (both trustors who are counting on an AI system and reliable and interested third parties) could be captured and represented in an appropriate manner in the reward function, reinforcement learning-based agents might infer an optimal policy that selects actions in a trust-responsive fashion.Footnote 25 This could have important implications for the trust-responsiveness of our AI systems.

The reinforcement learning approach can be represented in terms of a Markov decision process. Each Markov decision process or MDP is a tuple of the form (S, Φ, P, γ, r), where S is the set of states (including the initial state s0), Φ is the set of possible courses of action available to an agent (e.g. ϕ1, ϕ2, …, ϕn), P is the transition probability matrix (viz. P(s1 | s0, ϕi)), γ is the discount factor, and r is the reward function. My Pettit-inspired trust-engineering approach involves working esteem-based considerations into r. Alternatively, we could get our AI system to infer the reward function r by observing the behaviour of trustworthy exemplars. This approach is known as inverse reinforcement learning and is yet another tool that could be used in trust-engineering (Abbeel and Ng, 2004; Vasquez et al., 2014). More specifically, we could rely on inverse reinforcement learning to derive an esteem-sensitive reward function r from the policy π or behaviour of trustworthy exemplars, before working this esteem-sensitive r explicitly into our reinforcement learning-based approach.

Consider an AI-based credit scoring system. Its trustors A1,⋯ ,An include lenders and various financial institutions. Reliable and interested third parties may include economists and government agencies. When AI-assisted decisions are made by lenders to extend or deny credit, the esteem of A1,⋯ ,An and these reliable and interested third parties could be captured and represented after the manner of restaurant and hotel reviews. This esteem, aggregated across parties (whose scores may be weighted according to the degree of esteem in which they are themselves held) and rendered computationally feasible, gives rise to a particular reward or feedback signal that will indicate how well the AI system (the trustee B) is doing trust-wise at a particular step. Through its interactions with the environment, we have good reason to expect a reinforcement learning-based AI system to end up selecting actions in a trust-responsive fashion. Where inverse reinforcement learning is feasible, we could get our artificial agent to observe the policy π or behaviour of a trustworthy non-AI-based credit scoring exemplar (B*) and infer an esteem-sensitive reward function r.

This AI-based credit scoring system could take the form of a computation-cum-transducer system. Its transducers or ‘sense organs’ allow it to interact with borrowers, lenders, and interested third parties. The states of this system could be directed at real-time credit information on individuals and businesses, transactions and bank statements, reports from lenders and other sources, the esteem reports of reliable and interested third parties, and other states of affairs in the world. These states will have conditions of satisfaction: when the system requests a bank statement and biometric information from the borrower, this request utterance will not be satisfied until the borrower obliges accordingly. These states have directions of fit: world-to-world (in the case of declarative utterances about the borrower’s creditworthiness) or world-to-word (in the case of imperative utterances, when commands or instructions are issued in the credit-scoring process). Given the right input from the world and the right history and context, this AI-based credit scoring system could have the sort of intentional states that we find described in Bynum (1985). Contra the no-intentionality skeptic, it will be far from the case that AI systems lack the relevant intentionality to qualify as appropriate trustees. Contra the reliance-without-trust skeptic, we may have good reason to trust these systems in the way that we might other human beings, given that they have intentional states and are apt to respond in a trust-responsive fashion.

7 Conclusion

In this paper, I have identified two problems of trust: a theoretical problem and a practical one. In Section 2, I identified trust as both a quaternary relation R(A, B, ϕ, G) and a mental state that the trustor A holds toward a trustee B with respect to the performance of a G-relevant ϕ. I also argued that a theory of trust typically identifies an accompanying set of conditions that must be satisfied before we have sufficient grounds for believing that a relation of trust exists between A and B. In Section 3, I entertained skeptical notes vis-à-vis AI-relevant theories of trust from four camps: the reliance-without-trust skeptic, the no-intentionality skeptic, the value-neutrality skeptic, and the distrust-as-default-position skeptic. In Section 4, I addressed these skeptical challenges to the theoretical problem and proposed an account of intentionality and related conceptual manoeuvres (e.g. the overruling of the fact-value distinction) that could be made against these skeptical overtures. In Section 5, I identified the scope challenge confronting any AI-relevant theory of trust or collection of theories that purports to be representationally adequate to the multifarious forms of trust and AI. In Section 6, I identified two candidate approaches to trust-engineering in response to the practical problem, one involving a certain reliance on a set of physiognomic biases and another relying on a set of esteem-seeking psychological mechanisms. Using Pettit’s theory about the cunning of trust, I weighed in favour of the second candidate approach rather than the first. I then proposed the development of an esteem-sensitive reward function that may allow AI systems to respond in a trust-responsive fashion, aided by reinforcement learning and/or inverse reinforcement learning.

I have not excluded a more conservative reading of my position that would merely permit our speaking in terms of trustworthy AI ecosystems instead of trustworthy AI systems (Section 5). Given the disanalogy between human beings and AI systems, we could redirect our demands for trustworthiness, not to the AI systems themselves but rather to the AI ecosystems at large. We could encourage big tech companies, AI researchers, industry partners, organizations, and users to cultivate their dispositions to be loyal, virtuous, or prudent, extend our hopeful trust to them and galvanize members of the AI ecosystems to live up to our hopeful vision of what AI systems can do and be, and appeal to their goodwill. Nonetheless, the trust-engineering account may help us to secure a position that is intermediate between trustworthy AI and reliance-without-trust. Botsman (2017) has argued that we are at the start of the third trust revolution in the history of human civilization. Distributed trust flows laterally between individuals, enabled by networks, platforms, and systems.Footnote 26 Distributed trust is the name of this third trust revolution: we trust other people through technology (including AI as its most advanced form), rate everything from chatbots to Uber drivers, and have come to rely on well-trained bots to give us advice, resolve our problems, and carry out our food orders (Botsman, 2017, p. 8). My philosophical attempt to address the theoretical and practical problems in an AI-relevant context may be situated within the vanguard of this third trust revolution.