1 Epistemology aut ethics?

Current debates in the epistemology and in the ethics of Artificial Intelligence (AI) focus on two largely disconnected problems:

[1.] Questions of transparency and opacity of AI, i.e. AI as a glass or opaque box [epistemology of AI].

[2.] Questions about making AI ethically compliant, ensuring that algorithms are as fair and unbiased as possible [ethics of AI].

We say ‘largely’ disconnected because attempts to connect these two problems exist but differ significantly from our entry point in the debate. Colaner (2022), for instance, discusses the question of whether there is an intrinsic (ethical) value in explainable AI (hereafter: XAI) and provides various arguments to answer in the positive.

The default position in the field remains that of a separation of ethics and epistemology. In this paper, instead, we establish a direct connection between these two problems. In establishing a connection between ethics and epistemology, we show how two dimensions of the discussion intersect. One axis we call the ‘epistemological—ethical dimension’. Another axis concerns the expertise of the actors involved, when posing questions about the epistemology and/or the ethics of AI. We call this second axis the ‘expert—non-expert dimension’. In short, we develop a framework that connects problems 1 and 2, or an ethics-cum-epistemology, as we shall call it; within this approach, we aim to explain how expert and non-expert actors can legitimately and meaningfully inquire into the explainability or ethical compliance of the AI.Footnote 1

To explain the core of our proposal, let us consider skeptical attitudes toward the ethics of AI in general. Hagendorff (2020), for instance, persuasively argues that, more often than not, ethical guidelines do not have a real impact. He writes:

“Science- or industry-led ethics guidelines, as well as other concepts of self-governance, may serve to pretend that accountability can be devolved from state authorities and democratic institutions upon the respective sectors of science or industry. Moreover, ethics can also simply serve the purpose of calming critical voices from the public, while simultaneously the criticized practices are maintained within the organization.” (Hagendorff 2020, 100)

Sometimes, the ethics of AI literature risks repeating known existing problems within the policy work of economists. For example, when economists advocate for policies that involve in principle (so-called ‘Kaldor’ or ‘Kaldor-Hicks’) compensation to the policy’s expected ‘losers’, but then leave actual compensation to the political process, which may lack incentives or political will to do so (Oxford 2023). The analogy is especially noteworthy that, within the ethics of AI, when post facto mitigation by third parties as a policy is advocated, this is often combined with the explicit or implicit realization that third parties may lack expertise or political will to act on mitigation (see Zarsky 2016). In general, many mitigation strategies are vulnerable to being hostage to the political process, which may itself be captured by better-financed vested interests.

Our entry point in the ethics of AI is very different. Without pretending to offer the magic bullet, we aim to provide an approach to ethics and epistemology to improve ethical compliance from the design stage and process, and that offers ways to inquire about ethical compliance from different levels of expertise. We present a framework to approach the process of design, implementation, and assessment of AI that simultaneously considers ethics and epistemology, and the expertise of the actors that inquire into these two. Thus, any time we talk about ‘process’, it is not merely the algorithmic process that we have in mind but the whole process from design to implementation and use, which of course, does include technical questions about algorithmic procedures. And when we talk about ‘assessment’, we do not only mean assessment by technical experts but by actors at any given level of expertise.

Because we think ethics is not a cherry on the cake, relegated to a post-hoc analysis, we start from epistemology and seek to identify relevant ‘joints’ of the process at which ethics must and should come in. In this sense, we will speak of internalizing values already at the design stage of an AI system. In particular, we argue that many foreseeable, undesirable social consequences can be internalized in the design process in ways that naturally extend precautionary and legal practices. We think the strategies we start developing here can be expanded, promoted, and taught in computer science departments and design schools, internalized in corporate missions, and help create a culture of responsible AI.

The paper is organized as follows. In Sect. 2, we position our contribution in the rich debate on the ethics of AI. We also elucidate the terminology used within the field and explain the main concerns that have shaped the research questions and issues regarding the epistemology and ethics of AI. In Sect. 3, we turn to the epistemology of AI; we discuss ‘Computational Reliabilism’ developed by Durán (2018) and Durán and Formanek (2018) as well as Creel’s (2020) approach to transparency, among others. We build on these approaches and use argumentation theory, more specifically, its treatment of argumentation from expert opinion (Wagemans 2011b), to develop an epistemology for glass-box AI.Footnote 2

Next, in Sect. 4, we further articulate our position, explaining how to include values in the process of design and implementation as well as how non-experts can inquire into the ethical compliance of an AI system—thus offering an epistemology-cum-ethics. In Sect. 5, the conclusion, we provide further detail on how our approach is distinct and complements existing approaches to link ethics and epistemology of AI.

2 AI and its ethical challenges

Artificial Intelligence (and the philosophy thereof) has a long and established tradition in the respective fields of computer and cognitive science and the philosophy of computing. The program of understanding, reproducing, and extending human intelligence has undergone ups and downs since the pioneering work of Turing, and it is undeniable that we are witnessing renewed interest in AI (see, e.g., Crawford 2021; Floridi 2021). In this new wave of interest, projects, and applications, the question of what one can do with an AI seems to have entered the central stage beside the already studied conceptual or theoretical questions. Alongside some high-profile AI abuses that have received media attention, this has contributed to shifting the whole discourse directly to questions about ethics and governance, which have been promptly recognized as fundamental by institutions such as the European Commission. In this context, the work of the ‘High Level Group on Artificial Intelligence’ is both timely and relevant, and an excellent example of the usefulness of interdisciplinary and intersectoral collaborations (AI HLEG 2019). The centrality and relevance of these issues also constitute the background of our contribution.

To set the stage, we clarify the meaning and use of some key terms. There is no consensus about what AI is, but the definition proposed by the HLEG will be a helpful starting point for our articulation:

"Artificial intelligence (AI) refers to systems that display intelligent behaviour by analysing their environment and taking actions – with some degree of autonomy – to achieve specific goals. AI-based systems can be purely software-based, acting in the virtual world (e.g. voice assistants, image analysis software, search engines, speech and face recognition systems) or AI can be embedded in hardware devices (e.g. advanced robots, autonomous cars, drones or Internet of Things applications)." (AI HLEG 2018, 1).

In a definition like the one above it is not central to pin down exactly what intelligence is, but rather the fact that, whatever it is, an artificial intelligence is a piece of software (that can, or cannot, be embedded in a hardware will not be relevant to our arguments later on).

Generally speaking, we can take a piece of software to be the whole set of instructions telling a computer what to do. More specifically, this set of instructions will be organized in an algorithm, a term often given different definitions, at times emphasizing their mathematical basis, implementation, or procedural nature (Creel 2020; Mittelstadt et al. 2016; Primiero 2020). For our purposes, it is crucial to keep in mind that an algorithm, or an algorithmic procedure, is a piece of code that can be nested in other codes; importantly, algorithms are the products of a design by one or more agents (computer scientists, scholars with complementary expertise, or other algorithms). Because algorithms are designed ‘to do something’, the implementation of the code is as important as earlier phases of the design stage. Our interest, however, is not merely in the algorithmic procedure itself but more broadly in the whole process, from design to use, which includes coding.

At the time of writing, there is abundant public discussion on AI, not just about its opportunities for scientific research or industry but also about its potential pitfalls and misuse and the normative framework needed to avoid them (Coeckelbergh 2020; Dignum 2020; Dubber et al. 2020; Liao 2020; Stahl 2021; Vallor 2016; Vieweg 2021). One line of argument is that AI may reinforce racial and economic injustice, which requires ad hoc mitigation measures, and many purportedly neutral algorithms turn out to be biased or promote biased outcomes (see, e.g. Carr 2021). So, for example, while digital tools and AI may well reduce overt racist bias in (US) mortgage lending, they simultaneous may reinforce systematic and historical sources of racism (say, by relying on purportedly race-neutral coding of neighborhoods that may reflect historical injustices (see Perry and Martin 2022)).

It is no surprise that bias-generating and reinforcing AI programs put ethics concerns at the very top of the agenda on AI. For instance, Zarsky (2016) identifies two problems: efficiency and fairness-based concerns. Regarding the first one, he observes there are known issues with reaching the ‘right’ decisions for individuals, as in the case of automated procedures for credit assessment. The apparent paradox is that, despite algorithms may be (and are) wrong in individual cases, the algorithm may still be pretty efficient (in the sense of reasonably precise in making accurate predictions or outperforming human operators) at the aggregate level. The usual solution to improve individual-level decisions is to increase transparency (for instance, regarding data collection and analysis). But this is no panacea because asking for more transparency likely means more financial costs for disclosure, more search costs, and more opportunities for confusion. Zarsky’s second concern is about fairness. He distinguishes three types of concerns: unfair transfers of wealth, unfair differential treatment of similar individuals, and unfair harms to individual autonomy. His point is that increasing transparency and imposing disclosure-related solutions in cases of unfair treatment do not necessarily mitigate or prevent them or rectify any injustices or errors. Part of the problem seems to be that focusing exclusively on transparency obfuscates the potential role of other regulatory steps needed. Another problem is that there are multiple and often inconsistent ways to axiomatize ‘fairness’ (Lee et al. 2021).

For other authors, privacy is also a key concern. Kearns and Roth (2020), for instance, argue that even if the majority of the data collected by the apps we use say data is anonymized, it can be proven that it does not take much effort to retrieve sensitive personal information (see, e.g., Matsakis 2019; Zuboff 2019).

But these are not the only ethical worries. Mittelstadt et al. (2016) provide a map of ethical concerns. For them, questions about implementation and execution are among the problematic aspects; their mapping aims to "[…] include ethical issues arising from algorithms as mathematical constructs, implementations (technologies, programs) and configurations (applications)" (Mittelstadt et al. 2016, 2). We find their mapping particularly useful and illuminating about what we need to pay attention to when performing an ethical assessment of an algorithmic procedure. In particular, the mapping informs us of how we think about the steps required to make a process of design more ethical: it identifies types of concerns (e.g., about what may make evidence inconclusive or about what may lead to unfair outcomes) and it acknowledges that some concerns are more epistemic while others are more normative, while they belong to the same mapping.

Simultaneously, to complement the line of work of Mittelstadt et al., we shift focus: we are interested in where, in the whole process of design, implementation, use, and assessment, these issues arise, and not just in what kind of ethical issues arise. In other words, we take the mapping of Mittelstadt et al. (2016) to be a valuable tool for running an ethical assessment of an AI after the fact. But our concerns are already at the level of design: how can we anticipate the concerns identified in the map while we develop an AI?

This is precisely the line taken by Kearns and Roth (2020). Their point is that the attention given to performance metrics veils any explicit consideration of social values, e.g., privacy and fairness (an argument of Zarsky (2016) too). Algorithms are instruments we use to achieve something. Still, unlike a hammer (which is an instrument too), once they are in motion, they have a form of agency—something that philosophers of technology have long investigated, and even before the advent of digital technologies (for a discussion, see, e.g., Kroes and Verbeek 2014). So, if we have to make these algorithms’ ethical’, we need to act at the level of design. Kearns and Roth argue that internalizing values in the process of algorithm design requires setting new goals, and especially generates new constraints for the learning process. It is likely that if an algorithm has to include, by design, privacy or fairness, it will have to compromise on the ‘usual’ performance indicators such as precision or speed. So, for example, to generate more privacy, one may well reduce the accuracy or speed of an algorithm. But this, Kearns and Roth argue, may often be a price worth paying. While Kearns and Roth have quite a lot to say about possible trade-offs, they do not provide a conceptualization of how to think about the internalization of values and social ends in the design process. At key junctures they leave such decisions to ‘society’ and ‘policy-makers’. Our point is that some of these key junctures are also in the hands of those who design algorithms, and that is why the question of how to internalize values is so important. Looking ahead: we will argue that by treating accuracy as the baseline goal of an AI, Kearns and Roth make it very difficult to treat values or social aims as anything but a trade-off with performance; and so ethics will necessarily be seen as a 'cost' or a ‘constraint’ on performance, while for us it is a design choice.

Even so, our contribution builds on Kearns and Roth (2020), and complements it in that we inquire about the process of design, implementation, and use, to identify key points at which critical questions about the epistemology and ethics of an AI system can be asked. In particular, if values are incorporated in the algorithm, as Kearns and Roth suggest, we should be able to check the process, as we explain through Sect. 3 and 4, at several points and from the perspective of actors holding varying expertise. Our argument is general in character and complementary to the more specific analysis of Morley et al. (2020), which instead maps and documents various ways in which, according to the existing literature, AI can in practice be made ethically compliant. In Sect. 3, specifically, we re-examine questions about the reliability, or the trust, epistemic agents can put in the AI system. By 'trust' we mean "a willingness to be vulnerable to the actions of another party based on the expectation that the other will perform a particular action important to the trustor and part of the agent's function or functional role" [adapted from Mayer et al. (1995)].Footnote 3 This shift puts us straight into epistemological considerations, in which ethics is embedded or internalized. But questions about explainability and about ethical compliance can be asked in different ways, or at different levels of abstraction, by different actors, as we also explain in Sect. 3.2.

3 The epistemology of glass-box AI

One task of an epistemology of AI is to provide an account of the reliability and precision of the machine, or else the conditions under which we trust the results, or outcome, of an algorithmic procedure. In particular, it is often argued that epistemic trust is based on features such as transparency, accuracy, or explainability (see, e.g., Creel 2020; Durán 2018; Durán and Formanek 2018; Mittelstadt et al. 2019; Ratti and Graves 2022). We begin by presenting ‘Computational Reliabilism’ (hereafter: CR) (Durán 2018; Durán and Formanek 2018) as a reaction to this received view on epistemic trust (Sect. 3.1). Then, building on Creel’s (2020) account of transparency, and in line with CR, we motivate a shift in focus from the reliability of the ‘outcome’ (i.e., the algorithm, the code, etc.) to the reliability of the whole process of design, implementation, and use (Sect. 3.2), and with further qualifications.

3.1 How can experts inquire into the reliability of an AI system?

3.1.1 Computational reliabilism and the question of transparency

Computational reliabilism (CR) is an approach for assessing the reliability of computational processes that primarily applies to computer simulations (the main focus of the work of Durán and Formanek (2018)) but can also be used for other types of algorithmic procedures, including much of digital technologies used in, e.g., medicine Durán (2021).Footnote 4

CR originates in Alvin Goldman’s process reliabilism, which intends to cash out the idea that a cognitive agent S is justified in believing the results of a given process in case it holds a tendency to produce more true results than false results (Goldman 1979). To answer the question: "how to trust the results of a computational process?", CR adapts this idea to the specific needs of algorithmic procedures and simulations, resulting in the following definition:

(CR) if S's believing p at t results from m, then S's belief in p at t is justified, where S is a cognitive agent, p is any truth‐valued proposition related to the results of a computer simulation, t is any given time, and m is a reliable computer simulation. (Durán and Formanek 2018, 654)Footnote 5

Before we get to our analysis of the fruitfulness of CR, it is worth noting that Durán and Formanek deviate from the strategy of asking for more transparency. For several authors, for instance Humphreys, there will always be a residual element of opacity because humans are too much outside the design process. Although full transparency can never be ensured, authors such as Newman (2016) have stressed the importance of sound practices. Still, for Durán and Formanek this is not good enough because some parts of the algorithm will remain inaccessible at least in real-time (say because it is too costly to access). And to authors such as Symons and Horner (2014), who warn that we cannot test all possible paths, Durán and Formanek reply that instead of testing all paths, we can use indicators to trust the results, despite the inherent opacity of simulations or other algorithmic procedures. It is worth noting that Durán and Formanek are not the only ones rejecting transparency as the way to ensure that an AI system is trustworthy. Ananny and Crawford (2018) reach the same conclusion but based on different motivations and types of argument. Later, we explain why we think we need to consider transparency, even though it is not the solution to explainability or ethical compliance. To foreshadow our position: the key to trusting the output of an AI system is not transparency alone; instead, we need cues from the process as a whole.

CR posits ‘indicators’ to establish trust in the outcome. There are four of them:

  1. a)

    verification and validation methods;

  2. b)

    robustness analysis;

  3. c)

    a history of (un)successful implementations;

  4. d)

    expert knowledge.

The first two indicators cover internal, technical aspects of algorithmic procedures, while the last two address aspects of the context in which the procedures have been developed.

Regarding (a) verification and validation methods, Durán and Formanek adopt a rather standard approach, for instance, that of Oberkampf et al. (2003). Simply put, verification is about the correctness of the model and validation is about whether the model yields accurate results when confronted with the ‘real world’. Existing discussions in the literature concern, for instance, how best to adapt standard definitions from computer science to simulations or AI, or whether verification is more important than validation (or the other way around), a debate that Durán and Formanek discuss. For us, the take-home message is that, broadly speaking, from the perspective of CR, verification is about the ‘internal’ aspects of modeling, while validation is about ‘external’ aspects (a kind of empirical adequacy).

The second indicator, (b) robustness analysis, in a sense, extends the scope of validation methods to different (but sufficiently similar) models rather than just one model. In computer simulation, as well as in other contexts such as econometric modeling, it makes a lot of sense to test for robustness, since models can be implemented in slightly different ways, even when applied to the same data set (see also Wimsatt 2007). According to Durán and Formanek, “the core assumption in robustness analysis is that if a sufficiently heterogeneous set of models give rise to a property, then it is very likely that the real‐world phenomenon also shows the same property” (2018, 15). In this sense, robustness is similar to "consilience", namely the convergence of evidence for scientific claims (see, e.g., Wagemans 2016; Wimsatt 2007).

The third indicator, (c) the implementation history, is based on the idea that we should look at the ‘local’ and specific histories of design and implementation of AI systems, which, according to Durán and Formanek, are largely cumulative. Durán and Formanek (2018, 17–18) say:

"[…] building techniques have their own life for 'they carry with them their own history of prior (un)successes and accomplishments, and, when properly used, they can bring to the table independent warrant for belief in the models they are used to build' (Winsberg 2003, 122). We include such history of (un)successful implementations as an important source for attributing reliability to computer simulations."

In the approach of Durán and Formanek, the fourth indicator, (d) expert knowledge, is normally used in combination with the third and has to do with the expertise of the actors involved. Expert knowledge, in CR, is taken to be some kind of attribute of a group of experts, following the approach of, e.g., Collins and Evans (2009). Understood in this way, expert knowledge is key (i) to justify why scientists and engineers believe the results (i.e., the outputs of an algorithmic procedure), because they trust the assumptions (made by experts), and (ii) in determining the robustness of a simulation and the (un)successful history.Footnote 6

There is a lot to learn from Computational Reliabilism, and we aim to build on CR to develop our epistemology for glass-box AI. We broadly agree with Durán and Formanek that each indicator comes in ‘degrees’ and that none is decisive in establishing trust (but the more positive scores on each, the better). In addition, each of the indicators may well involve many complex processes, methods, and procedures.

We disagree, however, that there is a hierarchy in these indicators, and especially that expert knowledge is weak because it could be “idiosyncratic in several ways”. This is an area where the exact, experimental, engineering, and computational sciences may have something to learn from the social sciences, notably about reflexivity. As it has been argued in social science methodology and in philosophy of social science, the point is not to wipe away expert opinion as a source of bias, but to disclose it, precisely to model and handle bias and, as we will argue, to increase the transparency of the whole process (Breuer 2003; Cardano 2009; Levy and Peart 2017; Russo 2022; Schliesser 2011; Subramani 2019); this is the positive view we examine later in the section. To be sure, this is a general point about methods across natural and social science, and across algorithmic procedures, simulations, and other methods.

One aspect of CR that may need improvement is the number of stakeholders, or actors, included in the conceptualization of reliabilism. We think it is important to give visibility to as many relevant actors as possible, but the definition of CR, in the current formulations, mentions only one, i.e., the cognitive agent assessing the process. But where are the designers? And where are the quality control managers and users or the evaluator of the AI system? In the literature, some contributors emphasized the need to discuss different actors or stakeholders. For instance, according to Zednik (2021), there are different stakeholders affected by the opacity of an AI system, and his solution is to identify different levels of explanation needed for different stakeholders (drawing on literature on explanation from philosophy of science). This anticipates one of our key points that even accuracy will be relative to particular stakeholders and aims. This is echoed by Langer et al. (2021), who make the point that different stakeholders will have different desiderata about explainable AI. In the rest of the paper, we consider different actors and their stakes in the design and assessment of an AI system. First, by developing an analogy with the evaluation of arguments from expert opinion, in Sect. 3.2 we explain how actors having different expertise can inquire into the epistemology or ethical compliance of an AI in different ways. Second, in Sect. 4, we argue that values have to be internalized already at the stage of design and implementation, and in this way we aim to put designers and their interlocuters within firms and suppliers into a vital position of responsibility. We confine however our discussion to the expertise of human actors, and do not consider, for reasons of space, the interactions of human actors with the AI system itself.

Finally, it is important to note that, although CR is set up as a form of control on the algorithmic process, the indicators are ultimately geared to provide a content-related justification of the outcome. Agreed, the third and fourth indicators can be understood as being about 'best practices', but, in current formulations of CR, they (1) are related to a lesser role and (2) ultimately still contributing to establishing trust in the outcome. From the perspective of Durán and Formanek, it is clear why transparency is not of immediate help. Still, a different take on transparency can motivate a shift in focus from the outcome to the process, which we think CR begins but does not complete. In making this shift, we can also consider the actors involved and their expertise: designers, peer experts, the public, institutional stakeholders, and others. Our next step is to re-introduce transparency into the picture, building on the account of Creel (2020).

3.1.2 From the reliability of the outcome to the reliability of the process

As mentioned in the previous section, Durán and Formanek (2018) do not think that transparency helps with trusting the outcome of an algorithmic procedure. One of their concerns with transparency/opacity is that these notions are not well-defined, often vaguely referring to “accessibility and surveyability conditions on justification” (2018, 647). More importantly, accredited definitions of transparency and opacity seemingly refer to intrinsic properties of a process or system, leaving out entirely relevant actors. For instance, Humphrey’s definition, quoted by Durán and Formanek (2018, 648), is as follows:

"A process is epistemically opaque relative to a cognitive agent X at time t just in case X does not know at all of the epistemically relevant elements of the process." (Humphreys 2009, 618)

Admittedly, definitions of transparency or opaqueness like the one above do not help much. In our view, the account proposed by Creel (2020) is instead a fundamental step in the right direction. In particular, while an emphasis on transparency is often motivated to generate public trust, what we take from her argument is that some forms of transparency can generate a more reliable participation in the design process for the experts and stakeholders themselves. This will be crucial for our argument, as it prepares the ground to discuss how different actors—experts and non-experts—may inquiry about an AI system, and come (not) to trust it. To be sure, explainability may often be important in some public-facing practices (e.g., medicine and health care). Still, in such circumstances, while ethics is internal to the practice, it can, in principle, be disconnected from epistemology (Herzog 2022).Footnote 7

The need for transparency is a controversial point in the literature, as we have just seen with Durán and Formanek, but is also clear from other contributions such as Lenhard and Winsberg (2010) or Humphreys (2009). We find Creel’s approach the most useful to make a step forward in the debate. Her view is that we do need transparency, and for two reasons. One reason hinges on practical arguments: transparency appears to be important for communication purposes to groups involved in the design and use of algorithmic procedures. Another reason has to do with normative considerations, putting forward the idea that this interest in transparency is justified. Creel distinguishes three types of transparency, covering different aspects of the process, rather than the output. In this way, we can make specific inquiries about the process, at different levels, and depending on the actor(s) involved. Interestingly for our purposes, Creel frames the problem in terms of improving knowledge (of the algorithm), rather than establishing trust in the output. Why transparency is key will become fully clear in the next section, about argument assessment. But first, let's present Creel’s account.

The question of what transparency is cannot be given one monolithic answer, because transparency can be different things. Creel distinguishes three types of transparency:

  1. 1.

    functional transparency;

  2. 2.

    structural transparency;

  3. 3.

    run transparency.

The first type of transparency helps us improve on “knowledge of the algorithmic functioning of the whole” Creel (2020, 573). Typically, this type of transparency is achieved when humans program the algorithm and is clearly more difficult in cases of ‘kludges’, i.e., nesting of modeling and algorithms, as it happens in climate modeling. The second type of transparency helps us improve on “knowledge of how the algorithm was realized in code” Creel (2020, 573). The problem to address is whether the same algorithm may be realized through different codes. As Creel puts it, the question is "not just to be able to read the code; it is to understand how the code as written brings about the result of the program." (2020, 575). Clearly, in an ideal situation, to achieve structural transparency, we should decompose the algorithm line by line. In practice, this can be highly time-consuming and the interrelations between different parts of the algorithm difficult to know, and in some cases, prone to wrong use. Creel then concludes: "[…] although we know how the learning algorithm works and what formal guarantees (if any) we have about its performance, we do not know how the learned "algorithm" brings about the classification result. Thus, we lack functional transparency." (2020, 580). It is reasonable to say that this type of transparency is the most difficult to achieve. Finally, the third type of transparency helps us improve "knowledge of the program as it was actually run in a particular instance, including the hardware and input data used" (2020, 569). Any considerations about material aspects of the design process, of the software, machines, or data that have been used, will be relevant to establish this type of transparency.

Recall, in Creel’s approach, each type of transparency, individually, does not improve the trustworthiness as such but rather the knowledge of the algorithm. This is important for us because we are developing an epistemology for glass-box AI. So the question is not just whether transparency makes the process trustworthy but how we can know it is trustworthy. Two remarks are in order. First, a relevant analogy here can be made with explanation. The scholarly literature on mechanistic explanation is instructive here because the decomposition and reconstruction of the mechanism, the identification of relevant entities and activities, and generally all these epistemic practices are part of what it means to explain, including explaining the falsehood of an explanandum (Glennan and Illari 2018). Following Creel's approach, we need transparencies to explain why/how knowledge of the algorithm is improved, and this about different aspects of the AI system and/or at different levels of expertise of the actors. This brings us to a second remark: it is important to specify who wants to know about the process. This is where the first axis of the paper, epistemology—ethics, intersects with the second axis, namely expert—non-expert.

Creel also makes the point, but does not develop it further in the paper, that transparency is a form of accountability towards non-experts, e.g., the public; in Sect. 3.3 we give some indications of how the public can engage with specialized algorithmic procedures in a meaningful way. In her paper, Creel focuses on questions of transparency for "skilled and knowledgeable creators and users of computational systems" (2020, 572), and these are one type of actor.

Now, we are in a position to explicate in detail and expand on the approach of Durán and Formanek here, that includes explicitly one actor, namely the cognitive agent 'S', in the definition of CR. The explanation of the four indicators implicitly refers to different cognitive agents engaging in different epistemic activities, but we want to make this aspect more visible and explicit. And as we have suggested, transparency can generate more reliable participation in the design process. Also, CR seems to assume expert knowledge in evaluating the different indicators. But in this way, the approach is of use to peer experts, while non-experts will have little or no clue about trusting the process. Combining an adapted version of the approach of Durán and Formanek and of Creel, we aim to explain how questions about the epistemology of AI can be asked (and are answered) differently, depending on the level of expertise of the actors involved.

To do so, in the next section, we draw an analogy with the evaluation of arguments from expert opinion: although non-experts can never assess the acceptability of an expert opinion directly (because this would require expert knowledge), they can assess its acceptability indirectly, namely by asking so-called 'critical questions', a mechanism by which grounds for trust (or not) are revealed (Wagemans 2011b). In our epistemology for glass-box AI, we operationalize the CR indicators and the three types of transparency by turning them into critical questions to assess arguments from expert opinion, and we shall see that adapted critical questions can also be used differently by experts (e.g., the "skilled and knowledgeable creators and users of computational systems") and by non-experts (e.g., the general public) to inquiry about epistemic and ethical aspects of an AI system.

3.2 How can non-experts inquiry into the reliability of an AI system?

In this section, we explain how experts and non-experts can assess the reliability and transparency of AI-assisted decision-making by asking 'critical questions' associated with argumentation from expert opinion.

3.2.1 Arguments from expert opinion and AI

To begin with, we say that whenever there is expert-to-expert communication, we are in a situation of (relative) epistemic symmetry, for instance, when a software engineer interacts with another software engineer with comparable expertise. Otherwise, if one of the parties does not hold relevant expertise, for instance, a patient interacting with a physician, we say that actors are in a situation of epistemic asymmetry (Snoeck Henkemans and Wagemans 2012).Footnote 8 This is a very common situation in many communicative domains, and it also applies to the use of AI systems in domains such as medicine and diagnosis, or finance.Footnote 9

We now provide an presentation of the characteristics of this argument type and indicate how it is assessed. Based on this analysis, we then develop a procedure for assessing the reliability and transparency of algorithmic decision-making. To establish or increase the acceptability of a certain claim or point of view, individuals may refer to various types of authority. One of these types is the 'epistemic' authority, usually denoting a scientific expert or engineer, who is viewed as a specialist in a certain domain. An argument that appeals to such authority is called an 'argument from expert opinion' and its general structure is the following (Goodwin 2011; Wagemans 2015; Walton and Koszowy 2017):

Claim q

Reasonq is said/endorsed by expert E

To assess the acceptability of an argument from expert opinion, one determines whether to accept a claim based on the fact it is said/endorsed by expert E. To do so, one can ask specific 'critical questions' (Wagemans 2011b; Walton et al. 2008). We introduce argumentation theory at this point, because 'trusting the output or process of an AI system' is very much like 'trusting an expert opinion'. Better said, an argument for trust in AI can be assessed just as an argument from expert opinion, via an adaptation or specification of the critical questions involved.

Because in cases of epistemic asymmetry the addressee is not able to assess the argument from expert opinion in a direct way, some scholars conclude it is always unreasonable or fallacious. They label such arguments as a fallacy, particularly as an argumentum ad verecundiam (Goodwin 1998; Hinton 2015; Wagemans 2011b, 2015). What these scholars ignore, however, is that the reasonableness of arguments from expert opinion can also be determined in an indirect way. In general, arguments of any type can be assessed by determining (1) the acceptability of the reason given in support of q and (2) the solidity of the argument lever (i.e., the support relationship between the reason and the claim) (Wagemans 2020). In the case of the argument from expert opinion, these two points of assessment can be specified as follows:

  1. 1.

    The acceptability of "q is said/endorsed by expert E"

  2. 2.

    The solidity of the relationship between "being said/endorsed by expert E" and "being acceptable"

For each of these two points of assessment, specific evaluation procedures apply. Regarding the first point of assessment, t "q is said/endorsed by expert E" is a complex statement that is assessed in two parts. First, it should be checked whether q is really said/endorsed by expert E. It might well be the case that q was not asserted by E at all, or that the version quoted in the argument is somehow distorted or adapted to the strategic purposes of the arguer. This can be checked by looking at a source where the original statement is mentioned. Second, it should be checked whether E is really an expert in the relevant field. For sometimes the expert quoted in the argument is not a real expert, for instance, because it is just a celebrity or someone with expertise in a different domain than the one in which the specific claim is situated.

The second point of assessment pertains to the argument lever, i.e., the relationship between "being said/endorsed by expert E" and "being acceptable". The reason for having this second point of assessment is that even if q was really said or endorsed by E and E is a real expert in the relevant field—in other words, even if the propositional content of the reason is acceptable—it doesn't mean the reason renders the claim acceptable. To provide a full-fledged assessment of the argument, it should also be checked whether the claim is acceptable based on the reason. This aspect of the assessment is related to our suggestion to shift from justifying the outcome to justifying the process. In this case, such assessment would entail considering whether there are any other factors that made the expert say/endorse the claim, such as a personal interest or gain, whether the expert can provide reasons in support of the claim, and whether other experts agree with the one quoted in the argument—aspects like this will be further discussed in Sect. 4, as they are distinctively about axiology and deontology. In case of epistemic asymmetry, the burden of acceptability is shifted from the epistemic to the normative elements (axiological and deontological aspects), which are discussed further in Sect. 4.

3.2.2 Critical questions and the acceptability of an argument from expert opinion

The following non-exhaustive list of critical questions (CQ) can be used to indirectly assess the acceptability of a claim that is supported by an argument from expert opinion (Wagemans 2011b). While CQ1 and CQ2 pertain to the content of the premise "q is said/endorsed by expert E", CQ3, CQ4, and CQ5 can be used to assess the solidity of the lever "being said/endorsed by expert E is authoritative for being acceptable".

(CQ1) Is q is really said/endorsed by expert E?

(CQ2) Is E really an expert in the relevant field?

These two questions aim to establish whether the supported claim corresponds to the claim endorsed by the expert and whether the latter has the relevant and appropriate expertise, based on which a non-expert can trust their claims.

(CQ3) Does E have a personal interest in saying/endorsing q?

This question is to exclude that major problems intervene at the deontic level; we do not develop this further on this occasion, although, of course, it is an important and pressing issue in many situations.

(CQ4) Is E able to provide reasons in support of q?

(CQ5) Do other experts agree with E?

These two questions are the most relevant to inquire into the epistemology of XAI (indirectly). They do not tackle technical aspects directly, but indirectly try to establish whether what is told by an expert is to be trusted. Finding an answer to CQ4 requires a certain expertise or can be dealt with in an institutionalized way (i.e., by appealing to institutional safeguards, as we discuss in Sect. 4). CQ5 can be checked by comparing the outcomes of different AI systems for the same problem or by comparing experts' views on the same output or problem.

The answers to these critical questions are related to the outcome of the assessment in the following way. If one or more of the answers are negative, the argument from expert opinion is unreasonable and the claim unacceptable. As Goodwin (2011) has observed, all criteria for judging argumentation from expert opinion are indirect in the sense that there is no possibility of verifying directly what the expert actually claims to know. The truth or acceptability of q can only be critically tested in an indirect way, namely by asking critical questions pertaining to the premise content and the argument lever, as explained above. This characteristic can also be ascribed to algorithms, the working of which is sometimes even opaque for the people who have designed them (Fig. 1).

Fig. 1
figure 1

Argument from expert opinion and critical questions

In expert-to-expert communication, we are in a situation of epistemic symmetry and the acceptability of the answer does not so much hinge on the authority, but on the technical details provided. In practice: expert A inquiries about explainability of AI system X, and expert B can reply by mentioning (aspects) of transparency and of CR, as discussed in Sect. 3.1. This strategy for expert-to-expert communication about the epistemology of AI works, unless we have reasons to doubt the reliability or integrity of the actors involved—but that is beyond the scope of the present discussion. However, the situation is very different if a non-expert or, as is also common, a partial expert, asks an expert about XAI. In this case, expert and non-expert are in a situation of epistemic asymmetry. For non-experts it is difficult, if not impossible, to determine in a direct way whether an expert opinion is acceptable or not. In such cases, trust in the process can be secured by inquiring with critical questions, and ultimately it will be ensured by the authority of the expert, or by some institutionalization of their expertise, reliability, or integrity. Thus, in cases of non-expert–expert communication, there is an ineliminable normative component, that is also present in the epistemology of an AI system. One may understand it as a species of risk that we must learn to manage.

4 Epistemology cum ethics

4.1 From epistemic to axiological (a)symmetries

Let us recap the argument thus far. In Sect. 3, we sketched the main lines of a glass-box epistemology for AI. We argued that such epistemology opens the door to ethics. In particular, it prepares the ground for internalizing values in the design and implementation process, which can then be subject to specific ethical inquiry and assessment.

We distinguished two scenarios. The first scenario is that of epistemic symmetry. Here, there is an 'expert—expert' inquiry into whether and to what extent one could trust the outcome of an AI system. According to our epistemology for glass-box AI, we trust the outcome because we trust the process; in an expert-expert exchange, technical details are addressed directly, both for epistemic (e.g., explainability) and normative aspects (e.g., fairness), which is in line with the approach of Kearns and Roth (2020): we can introduce ethical compliance in the technical development of the algorithm.

The second scenario is that of epistemic asymmetry. Here, the inquiry is from non-experts to experts, a widespread situation: patient and physician, mortgage applicant and bank, are but eminent examples of epistemic asymmetry. We have seen that, in cases like this, an inquiry into epistemological aspects cannot be 'direct' but uses critical questions associated with the 'argument from expert opinion'. The question of trust ("we trust the outcome because we trust the process") then turns into a question of reliability of the expertise, which already introduces key axiological elements into the epistemology. When a question of ethical compliance is posed, these axiological aspects become even more prominent, and institutionalization will be fundamental (Fig. 2).

Fig. 2
figure 2

Scenarios of epistemic (a)symmetry between experts and non-experts

In this section, we further articulate our view on the axiological aspects of assessing the outcomes of AI systems. We first explain how to include values in the process of design and implementation of AI systems (Sect. 4.2). We then introduce the idea that AI systems are not just value-laden but also value-promoting, and that to properly take values into account, we need a holistic approach to model validation, one that is broader than CR. In particular, we reject the common assumption that accuracy is the given, baseline aim and that other aims have to be modeled as trade-offs with it.

We illustrate and articulate this by offering a framework for incorporating attention to harms that affect intersectionally vulnerable populations into the design process. Finally, we address how non-experts can inquire into the ethical compliance of an AI system (Sect. 4.3). We develop an account of axiological reliability that is complementary to epistemic reliability and address the issue of how institutionalization plays a role in guaranteeing axiological reliability.

4.2 Internalizing values and holistic model validation

This is a good moment to return to the work of Kearns and Roth (2020). They explain how values can be incorporated into an algorithmic procedure, because the code can reflect specific ethical principles and values, if and only if these can be axiomatized or formalized (it does not mean there is always only one way to do so). For instance, the privacy of users whose data are processed in a given algorithm can be operationalized, and this may take more resources, for instance, in terms of time, money, or energy. According to Kearns and Roth, integrating values to make algorithms fair and unbiased will lead to a system that is epistemically less efficient but axiologically better. For Kearns and Roth, this is a trade-off.

We argue, instead, that making an AI system ethical should not be modeled as a trade-off. It should be a conscious and deliberate choice to internalize some values rather than others. It is in this sense that we speak of AI systems as value-promoting, a term that we borrow from Russo (2021). The moment in which we decide to promote fairness and unbiasedness, we are not trading-off with efficiency, we are proactively internalizing and promoting these values rather than others. Bezuidenhout and Ratti (2021) talk about 'embedding' values into the process, which we take an approach very close to ours. In other words: we take a normative stance, according to which designers, engineers, and any other stakeholders involved ought to explicitly consider the values (epistemic and non-epistemic) that play a role at each stage. That is, we can understand predictive accuracy as one of the competing goals/values.Footnote 10

This is the core idea of internalizing values in the process. We do not consider this aim original with us. Insisting on the importance of the intended use makes the approach very close to the very origins of AI, rooted in the contribution of Wiener (1988), who thought that cybernetics is a kind of moral engineering and applied ethics.

As we have suggested above, without some such internalization, the 'Ethics of AI' risks remaining a form of window-dressing, relegated to a post-hoc assessment. In addition, as we will argue, the value-ladenness of the algorithm is inevitable, so better be explicit about it. So where/how do the values come in, exactly?

In Sect. 3, we saw that the critical questions associated with the argument from expert opinion can help us assess the whole process, and not just aspects of it in isolation. From now on, we shall use the term 'model validation' not in the restricted sense that is custom in computer science and that refers to the adequacy of the model with respect to empirical data. As is common in general philosophy of science and in social science methodology (see e.g., Jiménez-Buedo and Russo 2021; Morgan and Grüne-Yanoff 2013; Russo 2022), we shall instead use 'model validation' in a broader sense, as to encompass the whole process, from beginning to end as we explain next. This broader sense of model validation entails that validation is not merely a technical characteristic of the model or algorithmic procedure, or even worse a property of some models but not others.Footnote 11 Model validation must include any reflective practice from the side of the designers or scientists, providing reasons why the whole process is valid (Fig. 3).

Fig. 3
figure 3

Holistic model validation

With critical questions we can identify key stages in the whole process that need to be assessed. Note, and this is an extension with respect to CR: the subject of evaluation is not just the output or outcome of the AI system. In line with the epistemology of glass-box AI outlined in Sect. 3, the relevant question is: How to trust the whole process? The 'whole process' is not reducible to the output or the algorithmic procedure per se, but it refers to the whole process that begins with establishing the need/goal of an algorithm up to the evaluation of its use and outputs. In 'normal' scientific contexts, the 'whole process' involves the formulation of the research hypothesis, selection of background knowledge and literature, up to the interpretation of results, and discussion of possible use in policy (when relevant and applicable). Technical jargon varies across disciplinary contexts, and in computer science we may rather talk about the initial process of design of the algorithm, which involves studying and considering the users' needs as well as technical possibilities, constraints related to implementation, costs, and use of the algorithm, or other. Thus, while assessing experts' reason to support some modeling choice, we may simultaneously inquiry into the values that have (not) been internalized (CR4). Or, when inquiring about expert disagreement, we may as well ask whether this is due to different value-judgments (CR5).

In our view, it is crucial, both for epistemological and ethical reasons, that the process under consideration starts with motivating the design and development of an AI system, and it includes technical, epistemological, and ethical considerations (as well as other resources) for its development and use. In this way, we make an important shift towards a procedure-based justification of the outcome. We make sense of the emphasis given in CR to the 'trust in the outcome' because what is at stake is the commitment of agents towards the outcome. But this is precisely the reason to make this shift: we can trust an outcome if we can get a grip on the procedure that justifies it.Footnote 12 This is well in line with Creel's approach to transparency, because her three types of transparency capture different aspects of the whole process.

4.3 The normative character of holistic model validation

To stress that a procedure-based justification must encompass not only the 'technical moments' of the process, but also considerations about design and use, we dub our approach ‘holistic’. In our terminology, a holistic approach to model validation is a procedure-based justification of the outcome, one that includes design, implementation, assessment, and use of an AI, or of any other techno-scientific object. In this process, the role of designers, scientists, or whoever is charge of establishing the validity of the model, is fundamental, and it is so for both epistemic and ethical reasons. In line with feminist epistemology (Anderson 2020), our view is that model validation is very much a matter or situating claims about validity. In their presentation, Durán and Formanek alerted the reader that they were unable, at the time of writing, "to offer a measurement of the degree of reliability" (2018, 656). Under our account, however, such a measure may not even be needed. We think that having such a measure would not solve the problem of trust in the outcome of an AI system. Our holistic approach to model validation is ultimately an argument for a more qualitative-oriented assessment, in which we need to weigh the pros and cons at all stages, but that does not necessarily map into a final 'magic' number. As mentioned earlier, we see validity more in terms of a reflexive practice that is internalized in the process of design, implementation, and use of an AI. These two ideas—of internalizing values and of holistic model validation—have at least the following four consequences.

First, model validation is to be done with specific purposes in mind. This means shifting jargon from the 'correct' model to the 'useful' model. This shift is far from innocent, because in this usefulness we can immediately embed, for instance, helping disadvantaged groups, or addressing other intersectoral vulnerabilities.Footnote 13 Likewise, we can qualify alleged epistemic values; for instance, 'accuracy' is not an absolute epistemic value, but carries important axiological and pragmatic components, for instance: accuracy for whom? This is no breaking news, as the argument that epistemic values carried some axiological contents was already made in the influential work of Douglas (2009).

We make this shift to trusting the process rather than the outcome, with the consequence that trusting the process means that it leads to 'well-established knowledge' and 'intended use'. On the one hand, in epistemic terms, knowledge is well-established, when the epistemic choices and constraints made during the whole process support the outcome—critical questions about the technicalities of the model, for instance verification and validation in CR terms, are here key in this respect. On the other hand, we can check whether use was indeed the intended one, if this was made explicit already at the beginning of the process of design. Notice, again, that this is a procedure-based justification of what is well-established and intended, not content-based justification, kind of 'after the fact'.

But there is more. What is 'intended' need not coincide with what is 'foreseeable' (see below). But it is precisely for this reason that we should plea for a procedure-based justification of the outcome, in which at least the following stages can be identified:

- process of design (engineering level);

- any process of control, e.g., computational reliabilism;

- process of design (intended use of a technology);

- process of control (any mechanism in place to ensure that the intended use is preserved).

Second, ethical assessment stands in a continuum with model validation, and they both begin very early at the design stage—this is precisely the meaning of ethics-cum-epistemology. In our view, ethical assessment should be ex ante, to be carried out in combination with all sorts of epistemological and methodological considerations, as well as in fieri, continuously through all stages, from design until implementation, use, and monitoring of use and maintenance of the system, and ex post.

When looking at the application of a reliable process in a social context, one is not just interested in its reliability for a specific task. One would also like to know what the effects of failure are. In particular, one would like to know something about the distribution of possible or likely harms on different kinds of populations, especially if these populations have different kinds of vulnerabilities (and appetites for risk). So, for example, something may function reliably as designed with industry beating low failure rates; yet when it breaks, all too rarely, the artifact may still be especially dangerous for kids. Or, some safety gears work swimmingly on average male subjects, less so on average female subjects (e.g., some medicines interact badly with pre-existing conditions in subsets of the population). Now, in many cases the harms that follow from such selective or asymmetric vulnerabilities can be internalized in the design, implementation, and testing process (and often this is mandated legally or by in-house risk assessment or exploration with stakeholders).

How to characterize what counts as an asymmetric vulnerability is not so easy especially, because many of the ethically or politically most salient harms may only become asymmetric due to causally intersectional effects (Bright et al. 2016). In addition, some asymmetric harms may be due to the fact that a truthful P reinforces or entrenches a socially bad status quo Q. For many purposes one may wish to distinguish among such selective vulnerabilities, but here we lump them together as an especially important set of unfair outcomes (Mittelstadt et al. 2016). So, now we can enhance the Durán and Formanek's (2018) framework as follows:

Ethical Computational Reliabilism (ECR) if S's believing p at t results from m, then S's belief in p at t is justified. where S is a cognitive agent, p is any truth-valued proposition related to the results of an AI, t is any given time, and m is a reliable algorithmic mediation without generating asymmetric harms to vulnerable populations.

Here 'reliable' already presupposes an ordinary use of the reliable algorithm in an assigned task. One thing that follows from this is that to generate an ECR, its sources must also be made to seek out and track asymmetric vulnerabilities. While this clearly makes initial research and development (R&D) more expensive, it may also reduce litigation costs and social harms (including withdrawal of the product) downstream. So, again, this should not be seen as a trade-off, but as a conscious choice about values guiding investments and design/implementation.

Third, algorithmic mediation may generate both unintended and unforeseeable outcomes. Here, too, there are many subtleties. Some unintended consequences may just be a matter of negligence. And these can be simply assimilated to (ECR). Morally, legally, and politically one may be held accountable for those if there are harms in use.

Other consequences may be unforeseeable in detail, or their tokens unknown, even though the outcome pattern (or outcome type) may be quite predictable after a while. For example, algorithmic mediation has made financial markets move at much higher speeds and has also increased the likelihood of mini and maxi flash crashes (Draus and van Achter 2012). The first was entirely predictable (and desired), but the (evolution of) exact speed(s) and volume of market transactions may have been unknowable in advance. And the new kinds of financial transactions were also known, even if the exact strategies were not. By contrast, it is possible that the likelihood of flash crashes was initially unexpected. Yet, by now, any given mini-crash may be surprising or unpredictable, although their occurrence is foreseeable, thus becoming a 'new normal' (Kirilenko et al. 2017).

That is to say, unforeseeable tokens or individual events can occur in foreseeable outcome patterns/types. If an outcome pattern has possible tokens with asymmetric vulnerabilities, these patterns should, all things being equal, be avoided and ought to be internalized in ECR (of course, sometimes one can compensate for downside risks, etc.). So, we propose the following modification to our framework:

(ECR1) if S's believing p at t results from m, then S's belief in p at t is justified. where S is a cognitive agent, p is any truth-valued proposition related to the results of an AI, t is any given time, and m is a reliable algorithmic mediation without (intentionally) generating foreseeable asymmetric harm patterns to vulnerable populations.

Obviously, this leaves the prevention, accountability, and remedy of some unforeseeable asymmetric harm patterns outside (ECR), so we do not view this as the last word.

Finally, a crucial feature of algorithmic mediation is as Mittelstadt et al. (2016) note that it can affect how our social reality is conceptualized, and becomes actionable in ways that are utterly unexpected (including a reinforcement of a bad status quo). So, algorithmic mediation can generate consequences that are not just unintended, but also unforeseeable in principle because they are transformative (Mittelstadt et al. 2019); this terminology goes back to Floridi (2016). Here, too, the fact that an algorithmic meditation is transformative may be intended and foreseeable. In any case, the transformative character of digital technologies has been long discussed, and for this reason the design and implementation of AI systems should have been given extra care. It is possible that some of the higher order outcome patterns including the asymmetric vulnerabilities can be predicted, and this is exactly the idea behind ECR1. The glass-box epistemology presented in Sect. 3 should help in ensuring the possibility of inspecting the system at any time, and to allow for both expert–expert and non-expert–expert queries about epistemological and ethical aspects alike. Nevertheless, we also think that there is an important role of institutionalization in all this, as we explain next in the section.

4.4 Axiological authority and the role of institutionalization

As we explained in Sect. 3.2, critical questions associated with the argument from expert opinion are meant to facilitate the non-expert–expert exchange about epistemological aspects of an AI system. Regarding this exchange, it is important to note that having an answer to the critical questions does not take away the epistemic asymmetry but leaves it intact. The asymmetry can never be leveled out: there is no way in which transparency of the process alone can answer the question. It can only be handled, namely by targeting axiological rather than epistemic aspects.

The sheer fact that non-experts can ask critical questions to evaluate AI systems is not enough. Given their status as non-experts, they should be facilitated to do so. In other words, what is needed are institutions and social practices that guarantees the solidity of the process both from an epistemological and ethico-political perspective. Even if non-experts cannot check the process, they can meaningfully ask (and expect) that institutionalization makes criteria and motivation explicit and transparent, and that this form of institutionalization also safeguards them—for an explanation of how various types of 'institutional safeguards' can enable non-experts to find answers to critical questions related to expert opinion in the medical domain, see Snoeck Henkemans and Wagemans (2012). On these aspects, the asymmetry between expert non-expert remains, and we can handle it by referring to the relevant institutional authority. Within the medical domain, for instance, an official list of medical doctors provides an answer to CQ2 (Is E really an expert in the relevant field?) and granting the right to obtain a second opinion enables people to find an answer to CQ5 (Do other experts agree with E?). Or, to give another example, as a non-expert, one does not hold epistemic symmetry with respect to the designers of the mortgage algorithm. But one can inquire the relevant authority (i.e., bank, or bank officers) about the underlying values used in the design process, and one ought to be able to ask regulatory agencies about their compliance practices. If 'ethical banking' has any meaning, this is what should be included. Obviously, this is not about mere words but also about generating the conditions for legal and social accountability. Ideally, and this is a measure for the degree of institutionalization, all the critical questions should be institutionalized so as to correct for the epistemic asymmetry.

Introducing references to institutional authorities may seem like a cop out. We had promised to internalize values into the design process, but now we are introducing institutions beyond it. That is a feature but not a bug of our approach. It also fits the practice of engineering more broadly: nearly all fields of engineers have professional associations, codes of ethics, certification authorities (including for ongoing training and recertification), and public/private institutions that design and promote quality metrics and standards (see, e.g., DeMartino 2011; Barry and Herkert 2014).

In sum, the two axes of investigation that we identified (ethics—epistemology and expert—non-expert) intersect at several points. Most importantly, and this is the sense of ethics-cum-epistemology, axiological elements are at work throughout the whole process of design, implementation, use, and assessment of an AI system.

5 Discussion and conclusion

We started this paper by noticing that debates about the epistemology and ethics of AI are largely disconnected. We offered an overview of the current approaches to ethical aspects of AI, and, as we observed, these approaches focus on developing criteria for determining the trustworthiness of the outcome of algorithms and in doing that, they can be characterized as 'post hoc'. Moreover, they assume that the assessor is a high-level expert who understands the workings of algorithms, thus making the ethics of AI heavily dependent on the epistemology of AI.

In this paper, we have set out to complement ethical approaches to AI by developing a normative framework that establishes a connection between the ethical and epistemological aspects of AI. The framework has two main characteristics contrasting with existing approaches. First of all, it focuses on the whole process of design, implementation, and use of AI systems. Second, it enables experts and non-experts alike to act as an assessor of such processes. The framework combines insights from argumentation theory and holistic model validation and brings together epistemological and axiological aspects of assessing an AI systemmaking. For this reason, we characterize it as an epistemology-cum-ethics.

We don't claim originality in identifying the missing links between ethics and epistemology. Yet, our approach is significantly different from some available accounts, and so it complements them in important ways. Within epistemology, there is a trend to take a more explicit stance about ethics and to contribute to the ethics of science and engineering discussion. Bezuidenhout and Ratti (2021), for instance, focus on teaching data ethics; our approach is broader because it focuses on research and development (R&D). Our approach is distinct from Bezuidenhout and Ratti also because they explicitly adopt a microethics approach, based on virtue theory, while our approach is not committed to a microethics approach nor to a virtue theoretical approach.

Furthermore, our approach can work across both the micro and macro levels. A general approach to model validation, in which epistemic and non-epistemic values are internalized, should be able to explain, in each specific case, both the micro- and macro-dimension not just of ethics, but also of epistemology. Also, our approach is about how modeling practices, including AI, can be value-promoting, through an epistemology-cum-ethics approach. Our idea of value-promoting is not necessarily based on virtue ethics (see the good discussion in Bezuidenhout and Ratti (2021), in the context of data ethics). We think of this as an advantage: part of the problem is the pretention, in scientific camps of the value-neutrality of scientific methods, algorithms, or anything of the like. They are never neutral, and the point of epistemology-cum-ethics is not to plea for promoting some moral virtues (also debatable across cultures), but to raise awareness that some of these tools will always, even implicitly, promote some values. Thus, this needs to be addressed, before a discussion about which values ought to be promoted.

The idea of value-promotion marks an important extension with respect to 'value-sensitive design' too, because this approach is largely confined to the design of technical artifacts, and how values become to be embodied in such artifacts, but it remains a deontic approach (Friedman and Hendry 2019; Nair 2018; van de Poel 2020). While clearly in line with value-sensitive design, our approach is broader in scope.

Our approach to internalize values, to perform ex ante ethical evaluation, and to inquiry into relevant axiological authorities well complements recent approaches to ethics-based auditing, which are, however, ex post. For instance, our approach is very much in line and complements that of Mökander and Floridi (2021). They propose the main lines for an ethics-based auditing. We agree with Mökander and Floridi that ethics is not about the result but about the process, and that the process of monitoring must be continuous. We find particularly valuable their roadmap to guide an ethics-based auditing. They list several constraints, and at different stages of the process. Notably, they distinguish between conceptual, technical, economic, and social, and organizational and institutional constraints. Auditing mechanisms clearly vary depending on which level is tackled. Our approach is complementary to Mökander and Floridi because, while they focus on auditing, we are interested in the perspective of the designer, and so how, from the side of the developer, we can follow a process that internalizes values. Similarly, the guidelines of the High-level Group of the European Commission push in the direction of more compliance, which at times is difficult because of a mismatch between the ethical principles one might wish to promote and the legal bases that would enforce them (Pupillo et al. 2021).

With respect to approaches in the epistemology of science and technology, our approach is broader than existing ones also in another sense: much of the discussion about data science and ethics (see again Bezuidenhout and Ratti (2021), references therein, and especially Floridi and Cowls (2019)), seems to address the question of ethics in digital techno-scientific environments as a special case. In our view, however, AI and digital technologies are not special with respect to 'normal' scientific contexts. Rather, they should be seen as a case in point. For this reason, we embedded the question of trust in the outcome of AI systems in a broader framework of argumentation theory and holistic model validation, which also applies to 'normal' techno-scientific contexts.

Moreover, although the topic of our research is similar to that of the field of 'critical technical practice', our method has an epistemological and normative thrust rather than a socio-cultural one. We share with the literature within this field the idea that planning includes, beyond technical aspects, a vernacular aspect; for us, this vernacular aspect is connected to argumentation theory and holistic model validation. But unlike that literature, we believe we can offer a framework in which practices can be normatively improved, and not 'just' assessed based on socio-cultural considerations.

Finally, our approach entails a further extension of the application of indirect assessment methods for arguments from expert opinion in situations of epistemic asymmetry. While such methods have been applied for this purpose to, for instance, politicians' references to economic expertise (Wagemans 2015), medical expert opinion in doctor-patient communication and its institutional guarantees (Snoeck Henkemans and Wagemans 2012), the application of insights about the assessment of argument from expert opinion to AI has been alluded to (Wagemans 2011a) but never worked out in such detail as in this paper.