1 Introduction

Artificial Intelligence (“AI”) is swiftly becoming a relevant component in judicial decision-making processes around the globe (see, e.g., Reiling 2020). In China, “internet courts” already provide an online dispute resolution mechanism, also involving AI components (Fang 2018; Shi et al. 2021). In the US state of Wisconsin, judges utilize algorithms to derive recommended criminal sentences (Beriain 2018). Assessments of the defendant’s risk of engaging in violent acts are increasingly used in many countries (Singh et al. 2014) with varying degrees of accuracy (Tolan et al. 2019; Greenstein 2021).Footnote 1 Such technologies are typically referred to as Algorithmic Decision Making (“ADM”; see, e.g., Newell and Marabelli 2015; Araujo et al. 2020).Footnote 2

Straightforwardly, judges can benefit from the presence of ADM in judicial proceedings (Reichman et al. 2020; Winmill 2020).Footnote 3 First, advanced automation can potentially reduce the effort cost required to search through the documents, seek out the relevant legal provisions, or apply the law to the facts of the case.Footnote 4 Second, the well-established benefit of ADM lies in its ability to provide predictions that humans find difficult to generate, e.g., because the human capacity required to detect patterns in complex cases is limited (Alarie et al. 2018; De Mulder et al. 2022). Hence, ADM can potentially increase judicial accuracy by providing new information that cannot be detected by the naked eye or by improving the analysis process.

Nevertheless, it is currently somewhat difficult to study how ADM affects judicial decision-making, as the inner-working of judicial decision-making can be opaque, so it is not easy to observe which technology has been used. More importantly, the use of technology is not binary. Instead, judges may turn to ADM in different stages of adjudication, albeit the exact features vary across judicial systems. For instance, a highly technological court may take the following form: in the early stages of the judicial process, parties may be asked to upload their statements (e.g., the lawsuit and the statement of defense) onto a computerized system, while entering some general details about the lawsuit (sum, type of lawsuit, details on the parties). The judge can then observe the documents and verify whether they are consistent with the relevant procedural rules. After the initial submission, judges can use a computerized system to keep track of the process, send out automated reminders on deadlines, or run any number of preliminary analyses. Algorithms can then play a continuous rule in the adjudication process, e.g., by generating predictions, providing assessments and issuing electronic decisions. The more the technology develops, additional automated functions might be added to such a process. In the extreme case, a digitized dispute resolution might eventually be used to replace the judge and thereby circumvent the courts altogether (see, e.g., Ortolani 2019). However, we will restrict attention to a human judge who delegates some (but not all) of the judicial functions to ADM.

There have been extensive discussions on the implications of using automated procedures in the judicial process. These include aspects such as transparency and accountability, judicial independence, equality before the law, diversity, right to a fair trial, and efficiency (Matacic 2018; Zalnieriute and Bell 2019; Morison and Harkens 2019; Wang 2020; English et al. 2021). In particular, a salient concern for using technologies by judges is unfairness, e.g., algorithms can generate discriminatory outcomes based on race, ethnicity, or age (e.g., Jordan and Bowman 2022; Köchling and Wehner 2020). Moreover, some studies suggest that judges might use technology selectively, as they tend to rely more on extralegal factors in severe cases (Cassidy and Rydberg 2020).Footnote 5 At the same time, existing studies from behavioral economics suggest that judges may be susceptible to various cognitive biases (e.g., Guthrie et al. 2000, 2007; Winter 2020),Footnote 6 raising the question of whether the use of ADM can de-bias judges (see, e.g., Chen 2019).

Nonetheless, the concern of unfairness may well harm public trust in the judicial system. As one recent example, the use of software known as “COMPAS” to assess individual risk of reoffending has led to public outrage following the discovery that the algorithm led to racially discriminatory outcomes (Zhang and Han 2022).Footnote 7 Another famous example includes the fraud detection system “SyRI” in the Netherlands, which collected large amounts of personal data. SyRI was challenged by civil rights organizations in the District Court of The Hague, which then ruled that the technology violates the right to privacy (van Bekkum and Borgesius 2021; Buijsman and Veluwenkamp 2022).

These examples could be viewed as a special case of a more general issue: the relationship between AI and trust. In a recent review, Glikson and Woolley (2020) survey over 200 papers published in the last 20 years and identify different dimensions that determine whether individuals trust AI,Footnote 8 such as tangibility, transparency, reliability, and the tasks’ degree of the tasks’ technicality. These determinants are then found to have different effects, depending on how the AI manifests itself (as a physical robot, a virtual agent, or an embedded component). For instance, low reliability seems to decrease trust when the AI is embedded but may or may not decrease trust when the AI manifests itself as a robot. Such complexity makes it difficult to speculate on how judges respond to advice generated by ADM and, by extension, how this affects the public’s trust in those judges.Footnote 9

There is extensive writing on the importance of public trust in the courts (see., e.g., Burke and Leben 2007; Gutmann et al. 2022; Jamieson and Hennessy 2006), on the one hand, and the importance of trust in technology (Madhavan and Wiegmann 2007; Lee 2018; da Silva et al. 2018; Felzmann et al. 2019), on the other hand. However, the intersection is (at least empirically) under-explored,Footnote 10 with a few exceptions. Hermstrüwer and Langenbach (2022) use a vignette study to elicit perceptions of fairness in three contexts (predictive policing, school admissions, and refugees) on a scale ranging from “fully human” to “fully automated”. They find that purely algorithmic analysis is considered the least fair, but purely human decision-making is also considered somewhat unfair. Conversely, their study finds that combining automated processes with high human involvement yields a higher fairness perception. Yalcin et al. (2022) conducted a vignette experiment over MTurk and found that subjects care whether the judge is a human or an algorithm, finding evidence of higher trust in human judges.

Our study asks a related question: do individuals care about the stage in which technology is used by judges (rather than the overall degree of automation)? This question is crucial because it allows refining the conclusion as to what individuals (dis)like about the combination of human and machine adjudication. Specifically, we utilize a taxonomy by Parasuraman et al. (2000) that differentiates between four different stages of decision-making: (i) information acquisition, (ii) information analysis, (iii) decision selection, and (iv) decision implementation.

For each of these, we elicit beliefs about the Level of Automation (“LOA”) most likely to ensure the fairest outcome using an online exploratory survey of 296 participants. Our analysis yields two main findings. First, we find that individuals believe that an intermediate LOA generates the fairest results in the information acquisition stage, which is consistent with the study by Hermstrüwer and Langenbach (2022). However, we also find that lower levels of automation are believed to generate fairer outcomes in the remaining stages. This result suggests that individuals' preferences for combining human decision-making and algorithms is driven by AI’s relative advantage in acquiring information rather than its advantage in analyzing it. In other words, people trust judges to apply their legal expertise but less so to gather the relevant information. This conclusion seems particularly relevant for the distinction between inquisitorial and adversarial systems, as the judge's role in evidence collection is more passive in the latter than in the former.

Second, we find evidence that individuals with a legal profession believe that lower levels of automation are fairer in the implementation stage. This suggests that lawyers, unlike laypeople, are even more skeptical toward the AI’s ability to execute judicial decisions, i.e., lawyers tend to trust judges more strongly when it comes to implementation.Footnote 11

Our results seem important both for institutional design (e.g., how much technology to allow in judicial decision-making) and for judges who operate within those institutions.

The remainder of the paper is organized as follows: Sect. 2 situates our study within the existing literature. Section 3 describes our study’s design, with results reported in Sect. 4. Section 5 discusses the results, highlights some limitations, and concludes.

2 Related literature

Our paper is related to several streams of literature, including existing attempts to classify legal technologies, perceived procedural fairness of algorithms (in particular, in judicial decision-making), and work on the relative advantage of technology versus humans in judicial processes. We summarize the relevant points in turn.

2.1 A taxonomy of judicial decision-making automation

There are several existing attempts to create some classification for legal technologies. The Stanford University Codex TechindexFootnote 12 (see, e.g., McMaster 2019) categorizes existing legal technologies into nine categoriesFootnote 13 but does not discern between technologies intended for laypeople and for experts. These categories also do not easily lend themselves to researching decision-making or automation. A different attempt is contained in a report by “The Engine Room” (Walker and Verhaert 2019), which focuses on legal-empowerment technologies. Unfortunately, this attempt mainly revolves around technological applications and does not strive to provide a comprehensive categorization. The Law Society of England and Wales launched another attempt (Sandefur 2019), distinguishing between two “waves of AI in law”: a first (“rules-based”) wave, comprised of document automation, legal diagnostics, and legislative analysis tools, and a second wave, which embodies attempts to predict outcomes of disputes, analyze documents, and perform risk assessments. Much like the others, this attempt aims to describe current technological solutions rather than provide a clear taxonomy. More recently, Whalen (2022) proposed to classify legal technologies according to their “legal directness” and “legal specificity”,Footnote 14 whereas Guitton et al. (2022a, b) suggest mapping regulatory technologies along three different dimensionsFootnote 15: the project’s aim, divergence of interests between stakeholders, and the degree of human mediation. A different approach was taken by Tamò-Larrieux et al. (2022), who propose the concept of Machine Capacity of Judgment (MCOJ). According to this concept, classification should be derived by the artificial agent’s autonomy (i.e. freedom from outside influence),Footnote 16 decision-making abilities (including understanding the impact of decisions and balancing different options), and degree of rationality. Tamò-Larrieux et al. (2022) propose assigning a score to these parameters and leveraging those scores to determine how much to rely on the AI in question.

While these recent proposals seem useful for identifying what constitutes LegalTech and who it influences, they are less suitable for capturing how and when the technology is used by judges in their decision-making process.

We, therefore, take a different approach: combining elements from existing taxonomies and adjusting them to classify legal technologies. The starting point follows Parasuraman et al. (2000),Footnote 17 which break down the process of decision-making into four stages: (i) information acquisition—gathering, filtering, prioritizing, and understanding the data; (ii) information analysis—analyzing, interpreting, and making inferences and predictions; (iii) decision selection—prioritizing/ranking decision alternatives; and, (iv) decision implementation—executing the choice (e.g., writing-up and submitting the relevant document). While the categorization is not specific to legal decisions, it applies to judicial decision-making. For example, to make a ruling in the case, a judge must acquire relevant case law, analyze the information, generate alternatives, choose the best one, and implement the decision (e.g., write up a verdict). Notably, the stages are mutually inclusive, as to select the most relevant argument, one has to identify it and analyze it. Nevertheless, this taxonomy seems helpful for analyzing legal decision-making (see, e.g., Petkevičiūtė-Barysienė 2021).

This decision-making categorization is also closely related also to a paper by Proud et al. (2003), which describes the stages slightly differently (as “observe”, “orient”, “decide”, and “act”), but is nonetheless helpful for our purposes. They propose an LOA scale along the four stages of decision-making, with the underlying assumption that the preference may differ for each stage. We follow this assumption and elicit the beliefs of our survey respondents on fairness generated by the LOA in each stage of the judicial decision-making progress on a 5-point (Likert) scale ranging from “Manual” (i.e., no automation) to “full” (fully automated).Footnote 18 We describe the precise definitions in Table 1.

Table 1 Taxonomy of levels of automation and decision-making stages

2.2 Perceived (procedural) fairness of ADM

Procedural fairness has long been applauded as means to keep litigants satisfied, cooperative, and trusting in the courts (see, e.g., Burke 2020; Burke and Leben 2007; MacCoun 2005). The presence of ADM in court procedures may, intuitively, affect both procedural fairness and the public perception of fairness. Studies on the perception of procedural fairness when AI is involved (not necessarily in the context of courts; see, e.g., Woodruff et al. 2018; Lee 2018; Lee et al. 2019; Saxena et al. 2019)Footnote 19 yield some mixed results. Some studies find that algorithms are seen as less fair than humans (Newman et al. 2020; Hobson et al. 2021), e.g., because they lack intuition and subjective judgment capabilities. Other studies, however, find that that the difference in perceived procedural fairness of human decision-makers versus algorithmic decision-makers is task-dependent (e.g., Lee 2018).

In the specific context of judicial decision making, recent studies show that people tend to trust human judges more than algorithms (see Yalcin et al. 2022) or that they, at least, do not trust a fully-automated judicial process (see Hermstrüwer and Langenbach 2022). Kim and Phillips (2021) further argue that “a robot would need to earn its legitimacy as a moral regulator by demonstrating its capacities to make fair decisions”.Footnote 20

Legal scholars seem to be divided on their attitude toward the use of technology in the course. While some seem supportive of such technologies (e.g., Reiling 2020; Winmill 2020), others take a more conservative approach (see, e.g., Sourdin and Cornes 2018; Ulenaers 2020). Often expressing concern of algorithmic bias (for a recent discussion, see Kim 2022). Such conservatism seems consistent with procedural fairness concerns, but may also driven by other reasons, e.g. dissatisfactory experience with technologies (Barak 2021), a fear of becoming redundant (Sourdin 2022), concern of being pressured into using more technologies (Brooks et al. 2020), or simply “Automation Bias” (Cofone 2021).

2.3 Relative advantage, compatibility, and personal innovativeness in information technology

Individuals might prefer a different LOA due to various reasons, including their attitudes towards technology. However, the existing literature on the determinants of technology acceptance suggests that these depend on social context and are subject to heterogeneity. For example, “innovators”, as Rogers calls them (Rogers 2003; for a summary of the theory, see, e.g., Kaminski 2011), are willing to take risks, are the first to develop new ideas, and are easy to persuade to accept new technologies. Other groups follow different patterns. “Laggards”, for example, will remain conservative and skeptical even after the implementation of the technology. Most people, however, fall into two other (and more moderate) groups—“early majority” and “late majority”. The early majority demands evidence about the usefulness of the technology before they are willing to adopt it. The late majority needs more than that—they demand information on the technology’s success among other people. Accordingly, personal innovativeness in information technology is an often-used construct in the context of technology acceptance (see Ciftci et al. 2021; Patil et al. 2020; Turan et al. 2015).

These studies identify several factors that influence whether a person will belong to a group that is quicker to accept technology. We focus on two such factors—Relative Advantage and Compatibility—which seem the most closely related to judicial decision-making.Footnote 21 The first factor, Relative Advantage, refers to the degree to which an innovation is seen as better than the idea, program, or product it replaces. The second factor, Compatibility, refers to whether the technology is consistent with the potential adopters' values, experiences, and needs. In other words, the first factor deals with whether the technology used by judges is a proper substitute for human decision-making, whereas the second factor deals with personal preferences.

Our study controls for these factors, along a few others (e.g., general trust and knowledge in legal technology) in order to isolate the question of interest—whether or not individuals care about the stage in which the judge turns to automation for assistance.

3 Study design

The following sections describe our study design. Section 3.1 describes our participants. Section 3.2 explains the method and procedure.

3.1 Recruitment of participants

We designed an online survey to elicit people’s beliefs about the fairness of using varying LOA for the different stages of the judicial process. Lithuania was chosen for the study,Footnote 22 for two main reasons. First, Lithuania has relatively average legal technologies: there is no AI used directly by judges in courts but there is a history (long before COVID-19) of using technology generally in the courts, e.g. e-services portal for courts,Footnote 23 judicial information systems,Footnote 24 and audio recordings of court hearings (Bartkus 2021).Footnote 25 The generally increasing level of automation in Lithuania is beneficial, as it makes it more likely that individuals will have varying degrees of awareness of at least some automation in courts. Second, Lithuania was chosen also for reasons of convenience, given that we had a logistical comparative advantage, which allowed us to recruit participants from several relevant groups (e.g., lawyers and other people with court experience) in this country with greater ease.

Participants were invited to participate using several recruitment methods. Given the different levels of knowledge and experience within legal systems, we wanted to recruit people both within and outside of the legal profession, and both with and without court experience (e.g., litigants, observers, defense attorneys, judges). Individuals from the legal community were invited to participate in the study by posting an invitation to a popular legal news site,Footnote 26 emailing scholars from several universities,Footnote 27and “snowballing”, i.e., reaching out to attorneys, judges, and legal scholars and asking them to share the invitation to participate with their colleagues. The snowballing sampling was also used to recruit people who have been to courts in any role, e.g., litigant, judge, observer, as court experience may greatly influence how people comprehend court work. The non-lawyer portion of the sample was approached by posting in various other Facebook groups.Footnote 28 In order to reach a wider range of ages among participants, emails with invitations to participate in the study were shared with elderly people attending Medard Čobot’s Third Century University (distributed by the university’s administration).Footnote 29 Given the variety of methods used, we cannot guarantee that our sample is representative of the entire population in Lithuania (or of the legal community in Lithuania). To mitigate this issue, we added several control variables (see the following section), which allow us to account for the heterogeneity among the participants. Overall, a convenience sample of 269 Lithuanian respondents participated in the study from May to June of 2021.

3.2 Method and procedure

The survey consists of several steps. First, we elicited information used to generate control variables. These include some demographics (age and gender) but, more importantly, measures for specific attributes that may influence the respondents’ attitude toward the use of ADM in the legal sphere. Specifically, respondents were presented with sets of 7 statements about various legal technologies in courts alongside Likert scales to measure the relevant feature (for the sources used to derive the statements, see Table 9 in Appendix B)Footnote 30:

  • Knowledge about Legal Technologies (“Knowledge in Tech”)—statements concerning the respondent’s general knowledge of legal technology. For instance, one statement was “In some countries, judges have access to a program that provides the judge with a detailed analysis of the case, evaluates arguments, and identifies possible outcomes of the case”. Then they are asked to indicate their level of knowledge on a Likert scale from 1 (“I know absolutely nothing about this”) to 5 (“I have tried this or a similar technology”).Footnote 31

  • Trust in legal technologies technology (“Trust in Legal Tech”), a scale consisting of three revised/adopted items, e.g., “Overall, I could trust legal technologies in courts”.

  • Relative advantage of legal technologies in courts (“Relative Advantage”)—a scale consisting of five revised/adopted items, e.g., “I think legal technologies would help save time for court clients and staff compared to how courts operate now”.

  • Compatibility—a scale consisting of four revised/adopted items from existing papers, e.g., “I think legal technologies would be well in line with my beliefs about how courts should operate”.

  • Personal innovativeness in information technology (“Personal Innovativeness”)—a scale consisting of four revised/adopted items, e.g., “Among my peers, I am usually the first to explore new information technologies”).Footnote 32

We pooled each set of statements using a simple mean, so that each feature is captured by one variable in the analysis.

The second stage of the survey involved the elicitation of the LOAs, i.e., the belief about which level of automation in the four stages (information acquisition, information analysis, decision selection, and decision implementation) is most likely to generate the fairest outcome. Respondents first read a few general sentences (e.g., “judges need a wide range of information to decide a case—legislation, decisions in similar cases, and legal arguments. The alternatives for outcome and arguments depend on the information found.”). Then, they were asked to “Choose the option you think would best ensure the fairest verdict in most cases.” As mentioned above, the answer was elicited on a 5-point Likert scale, ranging from a “Manual" to a “Full” automation level.Footnote 33 This was done with each of the four stages separately—i.e., each decision-making stage was described on a separate page, and the participant chose a level of automation for each decision making-stage separately. Tables 9 and 10 in Appendix B provide a complete translation of the questions given to the respondents.

3.3 Descriptive statistics

Descriptive statistics for our independent variables are presented in Table 2 (see also Table 4 in Appendix A for bivariate correlations). The sample has a mean age of approximately 41 years but also includes younger and older participants, which is essential given the possible generational gap regarding technology acceptance. There are slightly more females (60.7%) in the sample, and there are both participants with and without court experience.Footnote 34 Given the importance of awareness about how courts operate, our sample contains a substantial number of people with court experience: over 50 percent of the participants (N = 143, see Table 2) had court experience with a variety of roles during the proceedings, i.e., 16 people observed the process, 38 were witnesses, 23-legal representatives, 24 litigants, 5 experts, 4 defendants in criminal proceedings, 13 victims of a crime, 5 prosecutors and 2 judges.

Table 2 Descriptive Statistics

Respondents reported relatively low levels of knowledge in legal technologies but rather high trust in such technologies. Furthermore, the descriptive statistics show a relatively strong belief in the relative advantage of legal technologies.

Table 5 in Appendix A compares the descriptive statistics between those with and without a legal profession, showing that lawyers in our sample have more knowledge about legal technologies (p < 0.001), but do not differ on trust, personal innovativeness, or compatibility.

4 Results

4.1 Perceived fairness generated by levels of automation

We begin our analysis by presenting descriptive results of LOA compared across the four stages of the judicial process in our taxonomy. The descriptive results are presented in Fig. 1 (a more detailed version is provided as Table 6 in Appendix A). The figure displays the percentage of responders who indicated that a particular LOA is most likely to produce a fair outcome.Footnote 35 The figure clearly shows that the density of subjects choosing an intermediate level is high in the information acquisition stage (40.15%), but lower in other stages (ranging from 16.85 to 21.19%). Respectively, the density of respondents choosing “low” is the highest in the other stages.

Fig. 1
figure 1

Perceived Fairness for each Level of Automation by stage of judicial decision-making

A Person chi-squared test reveals that LOAs differ across the stages (p < 0.001). As we elicited multiple LOAs from each participant, we also checked for within-subject differences using a Repeated Measures ANOVA, confirming that there are statistically significant differences between the beliefs regarding fairness generated by LOAs for different stages of decision-making (p < 0.001). Overall, this check reveals two key insights. First, the LOA for the information acquisition stage differs from the LOA for all other decision-making stages (p < 0.001 in all cases). Second, the LOAs for the other three stages do not differ from each other (p > 0.05 in all cases).

Next, Fig. 2 breaks down the data by different characteristics: Age (comparing older and younger respondents), gender, legal profession, and court experience (exact numbers are provided in Table 7 in Appendix A). The figure demonstrates that LOAs seem to be quite similar across different characteristics on a descriptive level (the only significant differences are between age groups). However, this does not yet constitute a full-blown analysis, as the variables capturing the characteristics are correlated (see Table 4 in Appendix A, which presents bivariate correlations). Hence, we proceed by using a regression model and control for these features simultaneously.

Fig. 2
figure 2

LOA by different characteristics

4.2 Linear regressions

Linear regressions (OLS) were conducted to predict preferred levels of automation in judicial decision-making stages. The regression model is

$$LOA = \beta_{0} + \beta_{1} Analysis + \beta_{2} Selection + \beta_{3} Implementation + \beta_{4}^{\prime } X + \epsilon ,$$

where the first three variables are dummies for the stage of the judicial decision-making (so that information acquisition is the baseline category), X is a vector of varying controls, and \(\epsilon\) is the error term. Results are reported in Table 3 (a full table, including the coefficients of the controls, is provided as Table 8 in Appendix A). As each observation represents one decision of one respondent (each respondent provided an answer for the four stages, so that there is a maximum of 1076 observations),Footnote 36 we cluster the standard errors by respondent. Column (1) excludes controls. Column (2) adds demographics (gender, age, age-squared, legal profession, court experience). Column (3) adds the elicited Knowledge in LegalTech, Trust in LegalTech, Personal Innovativeness, and Compatibility. Column (4) adds an interaction term between the stages and legal profession. Column (5) replaces the controls with respondent fixed effects, in order to control for any feature that varies by subject but is, for whatever reason, unaccounted for by our controls.

Table 3 OLS Results

Table 3 reveals several key insights. First, the coefficients of LOA in the later stages of the judicial decision-making process (analysis, selection, implementation) are all negatively significant (p < 0.001). This suggests that the initial stage of information acquisition is believed to generate fairer results with a higher level of automation. In other words, individuals believe that the use of technology in later stages of the process is more likely to lead to unfair outcomes.

Second, respondents with a legal profession only differ from others in their perception of using automation in the implementation stage, with a negative coefficient for the interaction term (− 0.29, p < 0.005), suggesting they perceive automated implementation as relatively unfair.

Third, when comparing the coefficients of the three stages listed in the table (analysis, selection, and implementation), the coefficients are of similar size (and in fact, they are not significantly different from one another, as confirmed by a Wald test). This reaffirms that the results are driven by a distinction between the information acquisition and other stages (and not the differences across the three other stages).

5 Conclusion

Our study of beliefs about the level of automation in different stages of the judicial decision-making process reveals several interesting findings.

Firstly, people seem to believe that low levels of automation would ensure the fairest outcomes in judicial decision-making. The intermediate level of automation was preferred only in the first-information acquisition-stage. These results are consistent with the emerging literature on perceived algorithmic fairness within the law, which indicates that people might generally trust judges more than algorithms (Hermstrüwer and Langenbach 2022; Yalcin et al. 2022). However, it also suggests that a binary view of judges vs. algorithms might be insufficient for capturing how people perceive automation in the court, as perceptions change depending on the stage of the decision-making process. This finding might also be explained by the more general distinctions on trust in AI, as captured by the aforementioned paper by Glikson and Woolley (2020). Namely, ADM used in courts neither manifests itself to the public in a tangible way nor is it transparent, both of which tend to lead to lower trust.

Secondly, the evidence suggests that automation is perceived as most likely to generate fair outcomes in the first decision-making stage (information acquisition), which seems counter-intuitive: usually, one might expect individuals to trust algorithms in the analysis of the information more than its acquisition.Footnote 37 At the same time, people might be simply more familiar with the concepts of automatic information retrieval due to e.g., their day-to-day use of online search engines. As search engines have arguably gotten better over time, individuals may anticipate a similar process in ADM, which increases trust in information acquisition. However, other explanations for our finding may be found by turning to concepts from behavioral law and economics. For instance, suppose that judges fall pray to the so-called “confirmation bias” (see, e.g., Jones and Sugden 2001; for experimental evidence on adjudication, see Eerland and Rassin 2012), where they first form an opinion and then collect only the information that is consistent with that opinion. Individuals who anticipate the bias might then prefer to let an algorithm collect the evidence. Moreover, the presence of biases in the first stage might spillover to the following stages, e.g. because judges may turn to heuristics already from the beginning and this will form the basis for the subsequent stages.Footnote 38 An alternative explanation would be distrust in the current algorithm’s ability to perform analysis, selection and implementation, e.g., due to the usual aforementioned concerns of fairness and potential bias of ADM (for a discussion, see Kim 2022). In particular, implementation, unlike analysis, might be perceived as an inherently human process, requiring capabilities that are simply irreplaceable by a computer (see, e.g., Kasy and Abebe 2021). This is particularly true if implementation involves emotions (Yalcin et al. 2022; Xu 2022; Ranchordas 2022), e.g., allowing a human judge to incorporate equity concerns or compassion. Recall, however, that individuals with a legal profession in our study have an even stronger belief that automation in implementation is likely to yield unfair outcomes. This might be driven by either a true belief (e.g., due to conservatism, or to personal experience in representing clients in front of human judges) or political economy: if lawyers believe that their added value is in influencing implementation (e.g., by submitting written arguments to the judge before verdicts are written), they might object to automation in order to protect their stream of income.

Although our study is exploratory, the findings may potentially hold several important policy implications. First, our finding that the perceived fairness of automation is higher in the information collection stage implies that judges who are interested in maintaining their public support might take more liberty in using technology in the earlier stages of the process but avoid technology in later stages. From the perspective of judicial administration, one might even consider actively restricting judges from using automation for some actions, if judges prefer to save on effort costs and neglect the cost of lowering public support. Second, the stronger perception of lawyers regarding the unfairness of automated implementation may be especially important, as discontentment with automation might lead lawyers to communicate their criticism of the court to the clients. In other words, one might need to assign more weight to the preferences of lawyers because they might spill over to the clients.

Thirdly, the different stages we consider might be more relevant in inquisitorial systems—where the judge actively collects data—than in adversarial systems. The same is true for appeal systems in civil law systems, in which new evidence can be collected more easily in the appeal compared to common law systems (see, e.g., Feess and Sarel 2018). Of course, one should take this distinction with a grain of salt, as there may be second-order effects (e.g., if judges use automation to collect data, litigants may anticipate this and respond by hiding some information).

Lastly, we asked respondents to specify what they believe would yield a fairer outcome, i.e. we elicited their beliefs about fairness. This means that we do not directly ask whether they also prefer to have a fair process. As it is hard to imagine that people dislike fairness, it is plausible that respondents who believe that a certain process will yield a fair outcome also prefer to have that process in place. Therefore, our findings may well reflect the public’s preferences and not merely beliefs. Nonetheless, further research is needed to clarify this point, as it is possible that some specific sub-groups actually prefer unfair outcomes (e.g. guilty criminal defendants who would rather be unfairly exonerated).

Our study is subject to a few limitations. First, it is an exploratory study and, as such, uses simplified questions that aim at general opinions toward automation rather than specific opinions regarding adjudication fields. Nonetheless, it seems sufficient to illustrate the general point that differences exist between perceptions of technology at different stages. We leave the exploration of differences between legal fields for future studies. Second, our study builds on the existing literature, the highly-cited paper by Parasuraman et al. (2000), and applies discrete levels of automation, which has the advantage of keeping things simple for the subjects. However, future studies may well benefit by considering a more intricate distinction, such as the one proposed by Tamò-Larrieux et al. (2022). Third, as our pool of participants includes also non-professionals, one may question whether their views are essential. However, attitudes of the general public about court processes are no less critical to court legitimacy than experts’ opinions. Fifth, our use of Lithuanian respondents means that we can only capture opinions made against the background of judicial processes in Lithuania. A cross-country follow-up study may help establish whether our results generalize to other countries (in particular, given the possibility that inquisitorial systems differ from adversarial ones). Sixth, as discussed above, our convenience sampling is subject to limitations as well. Finally, our choice to restrict attention to fairly general questions might overlook the fine-grained details of the technology. Consequently, a different design, in which further details are provided on the technology’s capacity or functioning might yield different results. Future work would benefit from such attempts and shed further light on the ever-evolving issues discussed in this paper.