Public entities around the world are increasingly deploying AI and algorithmic decision-making systems to support public services or use their enforcement powers. The rationale for the public sector to use these systems is similar to the private sector-increase efficiency and speed of transactions and lower the costs (UK Government 2016; GSA 2020). However, public entities are first and foremost established to meet the needs of the members of society and protect the safety, fundamental rights, and wellbeing of those they serve. Their existence is justified by the promise of such service and protection. People agree to abide by rules knowing they will be served in return, the decisions will not be arbitrary and there are means of redress and assigning responsibility when a harm occurs. Therefore, public entities are held to a higher level of accountability and transparency than private ones which are profit-driven and might not necessarily have the interests of public as a priority.

Currently AI systems are deployed by the public sector at various administrative levels without robust due diligence, monitoring, or transparency. This results in a growing entanglement between the private vendors and public actors, and a blurring of the lines of accountability and responsibility. Public sector actors are also keenly aware of the gap between their existing internal capability and capacity compared to what is needed to properly procure and manage these systems (Executive Order 2020; OECD 2020). This paper critically maps out the challenges in procurement of AI systems by public entities and the long-term implications necessitating AI-specific procurement guidelines and processes. This dual-prong exploration includes both new complexities and risks introduced by AI systems, and the institutional capabilities impacting the decision-making process. AI-specific public procurement guidelines are urgently needed to protect fundamental rights and due process.

1 Literature review

When a public entity deploys an AI system to provide a public service or to enforce its powers, the choice for the individual members of the public to opt-out from the use of the system or being subjected to its use is limited. An individual who is in an unbalanced power relationship against a government entity cannot easily challenge the procurement and implementation of a system. Individualizing the harms and impact of a system can also make it difficult to distinguish between personal experience and group-level collective harms. To a certain extent, this power imbalance is corrected with transparency and accountability mechanisms available in public procurement process which obliges the public actor to provide access to information. The entity may be required to conduct assessments, disclose the details and findings, be ready to share further information if requested and answer to the public. The public and civil society, on the other hand, can use this information to understand the impact of the system, on certain groups, society, or environment. Such insight can help public to also challenge the system’s fairness, request modification or termination. This evaluation may also result in consequences for the public actor. However, the ability of the public entity to effectively share information and the society to benefit from the process and hold the entity accountable can be significantly impacted with the introduction of complex algorithmic systems. This impact is compounded when these AI systems are proprietary systems.

In United States, after civil society and legislators voiced concerns over privacy and bias in facial recognition technology (Buolamwini and Gebru 2018; NAACP 2022), Internal Revenue Service (IRS) limited the use of ID.me, a biometric identity verification software. News headlines show algorithms found to be biased against African American defendants in prediction of recidivism used in sentencing and bail process (Angwin et al. 2016), leading to false arrests (Hill 2020), or downgrading of student results from underperforming schools (BBC 2020).

As these examples continue to grow, accountability concerns grow in parallel. It is now customary to list all the algorithmic bias cases at the beginning of each research paper to draw attention to how ubiquitous algorithmic systems became and how these systems might be biased. However, despite its implications on fundamental rights and due process, the literature covering the nuanced challenges of AI in public systems is still growing slowly. This paper highlights the current research and practice gap focusing on public procurement guidelines for AI systems.

Literature review of this paper covers policy documents, academic research, and civil society reports. Several policy and regulatory developments are envisaged to govern public and private use of AI systems, such as European Commission’s draft AI Act which proposes bans on certain AI systems. The draft bill requires providers developing, and public entities using high-risk AI systems to assess their AI systems, engage in ongoing risk management and register their assessments and documentation in a public database (European Commission 2021). Council of Europe is also working on a legally binding transversal instrument, which proposes certain AI systems and practices used by public actors to be banned. Council’s Ad Hoc Committee on AI recommends human rights impact assessments to be conducted for AI systems which might have a negative impact on health, safety, and fundamental rights (Council of Europe 2021). Government of Canada requires public entities to conduct impact assessments prior to production of an algorithmic system (Government of Canada 2020), while UK regulator provides guidance to organizations on how to explain AI practices (Information Commissioner’s Office 2020). France parliament requires all algorithms used by the government be made open and accessible to the public (L’Assemblée nationale 2016). United States executive branch establishes principles for the use of AI in the Federal government (Executive Order 2019, 2020), while National Institute of Standards and Technology drafts AI Risk Management Framework (NIST 2022).

In addition to these regulatory discussions, academic researchers surface the impact of algorithmic systems in public sector and call for algorithmic accountability (Barocas and Selbst, 2016; Calo and Citron 2021; Cooper et al. 2022; Crump, 2016; Diakopoulos, 2014; Eubanks, 2018; Kroll et al. 2017; O’Neil 2016; Pasquale, 2015; Richardson et al. 2019; Schwartz, 1992; Veale et al. 2018; Young et al. 2019), and impact assessments to be mandatory (A Civil Society Statement 2021; Ada Lovelace Institute 2021; Kaminski and Malgieri 2019; Reisman et al. 2018). A robust literature identifies the need for transparency and public disclosures. Such disclosures can be in the form of transparent procurement documentation, mandated human rights impact assessments, registries, as well as specification documents detailing the qualities of the datasets used and the design decisions embedded in the AI systems (Bender and Friedman 2018; Gebru et al. 2021; Hind et al. 2019; Holland et al. 2018; Metcalf et al. 2021; Shin 2020).

Some civil society organizations directly call on governments to take action. Center for AI and Digital Policy’s “AI and Democratic Values Index 2020” report (CAIDP 2022) provides an analysis of the national AI strategies and practices across 30 countries. One of the five recommendations of the Index for national governments is “Countries must commit to the principles of fairness, accountability, and transparency in the development, procurement, and implementation of AI systems for public services.” In the second edition of AI and Democratic Values Index (CAIDP 2021), the analysis is extended to 50 countries. Results show some countries are deploying responsible practices in their use of AI in public sector. However, there is still an outsized number of governments using AI systems which were not developed, procured, and implemented transparently or managed in a way benefiting fundamental rights and society first. AI Now Institute provides analysis of algorithmic decision-making (ADM) system uses by government in US (AI Now 2018). AlgorithmWatch, a European civil society organization, in its Automating Society Report, provides a similar analysis of use cases by European governments. The report recommends “Without the ability to know precisely how, why, and to what end ADM systems are deployed, all other efforts for the reconciliation of fundamental rights and ADM systems are doomed to fail (AlgorithmWatch 2020).

2 Methodology

Public entity systems and decisions are subject to various requirements for fairness, transparency, and accountability. However, the ability to meet these requirements might change with introduction of AI systems. The complex nature of algorithmic systems introduces new and emerging risks when applied in different social, political, or economic contexts. This paper uses a dual-prong exploration of both new complexities introduced by AI systems, the institutional capabilities impacting the decision-making process and the long-term implications. Such dual-prong approach is necessitated by the intertwined nature of risks and implications in general, and the new complexities added by AI systems in specific. The exploration brings together academic research with the broader policy discussions. Analysis of emerging risks and challenges creates pathways to then identify the impact on fundamental rights and due process. When the complexities and implications are mapped together, researchers and policymakers can use this framework to critically interrogate the existing public agency practices and develop new guidelines and processes. The methodology includes analysis of policy and advocacy documents, investigative journalism articles, and literature review of the concepts of responsibility, fairness, accountability, and transparency in the context of both algorithmic systems and public entities. The author also draws from discussions with numerous public sector practitioners and advocates globally about institutional challenges and the complexities of governing AI systems in the public sector.

3 Procurement at a glance

In a normal public procurement scenario, a public actor might announce a Request for Information/Price (RFI / RFP) to gather information on vendor, product, service, technical specifications, or pricing from private entities. The procurement team(s) might then complete due diligence as per the needs of the entity and award a contract to a vendor. So far this is a normal process which repeats itself across many countries. In a better scenario, there might be a requirement to conduct an impact assessment (such as environmental or human rights impact assessments) or consideration on how the product fulfils public policy objectives. The results of the assessments are shared publicly, so a discussion can take place before a determination to deploy a system can be made. These publicly available impact assessments allow the interested parties and impacted communities to engage with the process, raise concerns, provide feedback, and at times, stop a system from being implemented if the impact level is unacceptable. The transparency in this process helps the parties to question and verify information and to hold organizations accountable. In an optimum scenario, the public is also engaged in the oversight of some of the systems so the practitioners can be checked against conformity with the rules of engagement. For example, a civilian oversight council, or a citizen oversight board, might be a governing body assessing the engagements of a department of public safety or law enforcement at a State or City level.

4 Challenges regarding responsibility

The challenge of the distributed state of responsibility in the case of AI systems can emerge across three different layers. First one is due to the nature of different administrative levels within a country, the second is due to multiple actors involved in the design, development, and use of AI systems and the third layer is the increasingly complex and opaque nature of AI and big data systems. This draws from Nissenbaum’s concept of ‘many hands’, one of the barriers to accountability. Complexity can refer to datasets and models; organizational/institutional layers without clear cut responsibilities; different systems and datasets interacting with each other, and finally the nature of operating systems that contribute to the functioning of the whole system. Nissenbaum suggests that any and all of these levels of ‘many hands’ problems can operate simultaneously, further obscuring the source of issue (Nissenbaum 1996).

US, for example, is a country with federal system. It has a distributed level of responsibility and engagement across its federal, state, city and even town level administrative structures. In non-federal systems, the levels are different but can be still distributed as national systems versus city or town municipalities. Each entity at each level has its own policy agendas and budget and procurement priorities. As of 2020, only at US federal government level, 157 AI tools across 64 different national government entities were documented (Engstrom et al. 2020; Coglianese and Lehr 2017; Coglianese and Lampmann 2021). This number might not reflect the accurate count since there is no consolidated public registry with a single definition of AI systems. Total number of AI tools across different administrative levels is unknown. Even at horizontal level, for example law enforcement in different cities might use a variety of AI products. Variances in implementation and operationalization of AI systems can also cause a layer of complexity in assigning responsibility.

The development and maintenance of AI systems themselves also necessitate different stakeholder involvement in the process. From collecting the data to developing of AI models to securing the infrastructure to maintaining the systems, multiple actors make decisions through the lifecycle of an AI system. So even setting aside the distributed responsibilities within the public entities, the AI development process is also complicated to pinpoint to a certain decision which might have caused a harmful outcome.

AI systems are used to analyze very large sets of data to make predictions, classifications, recommendations, decisions, etc. The complexity of the datasets and some of the more advanced techniques used for AI models make these systems opaquer, at times to the point where neither the developers nor the users understand how a certain outcome was produced. In addition, some techniques allow the AI models to continuously learn from new data and user interaction. This means that even if there was an understanding of the initial model, the situation might change in time due to new learning or adversarial attacks to the systems. Paraphrasing Weizenbaum, complexity distributes responsibility (Weizenbaum 1976).

In the context of AI systems used by public sector, this multi-layered complexity can also mean that the public actor itself does not understand the system it is procuring and deploying. Institutional capacity limitations, both on procurement and implementation phases, may result in discriminatory or faulty systems embedded in core function of the entity. A great number of current regulatory efforts as well as the technical research focus on a requirement of explainability of AI systems (Adadi and Berrada 2018; Dwork et al. 2012; Forsythe 1995; Haijan and Domingo-Ferrer 2013; Ribeiro et al. 2016). Explainability usually focuses on technical transparency of the components of AI systems. The assumption is if the behavior of the model and the outcomes can be explained for different parties, then the system can be scrutinized for accuracy, mathematical definitions of fairness and model behavior. Other studies analyze effect of explainability in AI on user trust and attitudes toward AI (Shin 2021). However, technical transparency might not always be available. US Federal Acquisition Regulation (FAR), which is the primary document, and agency acquisition regulations, gives the government unlimited rights in data except for copyrighted works. FAR “specifically excludes the source code, algorithms, processes, formulas, and flow charts of the software” from the Form, Fit, Function data (US FAR 2022). Even if all information was available, as Busuioc remarks, “significant technical expertise asymmetries run to the detriment of [public sector] users, further compounded in the public sector by resource shortages and cut‐back pressures on public services, often driving the adoption of algorithms in the public sector” (Busuioc 2021). In short, the ability of public procurement teams to understand the accurate functioning of algorithmic systems is constrained by informational asymmetries, multiple sources of bias (Hickok et al. 2022; Brown et al. 2021), current procurement guidelines, human biases in perception (Shin 2022) and multi-layered complexities detailed above. These constrains then create a butterfly effect on how the algorithmic systems impact society.

5 Challenges regarding fairness

Fairness in algorithms has been discussed in different public system use cases such as welfare eligibility (Eubanks 2018; Lecher 2018), immigration detention (Koulish 2016), or recidivism (Angwin 2016; Larson et al. 2016). In the absence of AI-specific public procurement guidelines and lower levels of institutional capabilities (Dunleavy et al. 2007), public actors may implement AI systems which result in unintentional, negative impact on individuals or society. Public actors interact with society (Sloane et al. 2021). They might procure a proprietary system developed without consideration for existing policy motivations, values, regulatory rules, or fundamental rights. A four-part formula can help explain how AI systems may magnify or deepen the existing inequities and biases within society.

$${\text{Values + Data + Algorithmic Models = Outcomes}}$$

Humans encode their values within all the systems and structures they build. Value encoding, which does not consider the diversity of perspectives and experiences, results in empowering and privileging one group’s values and perspectives over others. Value misalignment, on the other hand, means what we want the AI systems to do and what AI system does may be very different, leading to serious unintended consequences (Birhane et al. 2022). So even when we intentionally try to code certain values, we might get it wrong.

Data which trains the AI models are collected by humans, shaped by humans and they are about humans. “Every data set involving people implies subjects and objects, those who collect and those who make up the collected. It is imperative to remember that on both sides we have human beings (Onuoha 2016)." Every such dataset reflects historical and structural inequities.

Algorithmic models work on mathematical definitions and functions. They optimize the given functions. There are multiple definitions of algorithmic fairness (Verma and Rubin 2018). Shin points out the meaning of algorithmic fairness is context dependent and that there is no widely accepted definition (Shin 2020). Sometimes different definitions of fairness cannot be simultaneously achieved (Berk et al. 2018; Chouldechova 2017; Friedler et al. 2021; Kleinberg et al. 2017; Mitchell et al. 2021). The issue is compounded due to dependency on ‘only’ mathematical formulations of fairness. What cannot be formalized to ensure fairness or equity cannot be part of an automated system. A corporate vendor developing technological solutions will end up simplifying the problems. Public policies will be translated into what can be quantified and what can be coded.

A public entity must have the means to interrogate an AI system and understand the consequences of deployment. These risks must be examined at procurement stage. However, AI systems are sociotechnical systems, in other words, they are made up of both social and technical elements. They interact with their environments. Their behavior and outcomes are shaped by their interactions with humans and environment. In return, they shape their environments and change the behavior of those around them (Dobbe et al. 2018; Eckhouse et al. 2019; Sculley et al. 2015). These systems are used to reduce the complexity of human nature and interactions to collectible data points. These data points are expected to represent humans and society through abstractions and constructed labels. However, value encoding through abstraction may ‘render technical interventions ineffective, inaccurate, and sometimes dangerously misguided when they enter the societal context’ (Passi and Barocas 2019; Selbst et al. 2018). Development of AI systems requires interactions to be datafied via proxies. It also requires legal and philosophical concepts to be converted into mathematical formulations. All these actions require developers to make choices and design decisions (Sculley et al. 2015). A corporate vendor might claim its AI system is a solution to a societal problem or a public service. However, such claims are easily made without an understanding of the context of a public service, the rationale and history of policies, the impacted people and communities, and the interactions of these systems with stakeholders and implications of the public service in the greater welfare of society.

The public actor must also have the capability and capacity to monitor such embedded AI systems on an ongoing basis. Even a perfectly designed and deployed system (if one exists) shapes the behavior of its institutional users. Users change their expectations according to the data the system can collect and process, and according to the outcomes the system can produce. The decision-making environment changes. This issue is compounded by models using machine learning where AI models change their behaviors and outcomes due to new data and interactions. With such changes in the environment, the corporate product slowly fades into the background and becomes part of the institution without being questioned. If proper training is not provided and institutional capability is not present, the results of AI systems may be taken as objective truths which do not necessitate further review or questioning. Separately, the institutional priorities and agenda of a public entity might change in time. The policy goals may impact how a system is used. For example, the spirit of a policy might be to provide access to resources to everyone who is in need or eligible in the most efficient way. An algorithmic system is deployed for that goal. However, with changing agendas and priorities, the system might eventually be used to catch fraud, to limit the number of people accessing the systems, or to criminalize individuals from different backgrounds with lower income or education.

In most cases public entities are interested in AI systems due to resource and skills constraints. The entity determines a need for a solution to use resources more efficiently and respond to changes quicker. Big data collection and processing capabilities with AI systems become attractive. However, the stakes become higher in cases where such systems make determinations or even recommendations impacting people’s life and liberty, access to opportunities and benefits, their rights such as rights to privacy, expression or association and due process. In those situations, a more robust process is necessary to determine whether a system is the needed solution, whether it has impact on human rights and due process, and who gets to make those decisions. The same skill constraint which created a need in the first place can mean there is not enough internal resources and skill to critically interrogate vendor solutions.

One extreme example of above-mentioned complexities is predictive policing. The system depends on individual or location risk profiling and combines multiple data sources to make predictions about who might commit a crime (person-based or where a crime will be committed (location-based). The approach is not only an affront to presumption of innocence and due process, but it also lacks any scientific validity. The predictions depend on historical policing data which is racially and socioeconomically biased in many parts of the world (Barocas and Selbst 2016; Ensign et al. 2018; Kroll et al. 2017; Richardson et al. 2019). The tool offers the possibility of unconsciously privileging quantifiable metrics (Selbst et al. 2018). The predictions and heatmaps from these systems change policing practices. They change how resources are allocated, how police should interact with people, what data they should collect, how police will be incentivized by the collected data or which crimes should be prioritized (Brayne 2020). The outcomes become the starting point for certain policing practices. The private system changes the way a public service is rendered. Predictive policing systems are dubious and discriminatory but are nevertheless implemented by police departments across many states and countries. We see the lines becoming blurred regarding who is accountable the consequences when an AI system harms a person or a group. Behind a false veil of objectivity, these systems can have irreversible impact on members of the society (Ananny and Crawford 2018). Such systems might harm a person physically, emotionally, mentally, or financially. Either by intent or due to unintentional consequences, AI systems might end up discriminating against people who share similar characteristics (Buolamwini and Gebru 2018; Obermeyer et al. 2019; O’Neil 2016). Calo and Citron write “agencies are turning to systems in which they hold no expertise, and which foreclose discretion, individuation, and reason-giving almost entirely.” The writers also suggest “the systems the U.S. government is increasingly procuring yield results no human can justify.” (Calo and Citron 2021).

6 Challenges due to scale, speed and connectedness of algorithmic systems

Public actors have been procuring private software and technology for decades. What is different and so concerning with AI systems and big data infrastructures? Over the last couple of decades, the technologies to collect, store, process and connect data improved exponentially. Such change in hardware and software made it easier, faster, and cheaper to build and use these systems. These improvements made it easier for both public and private actors to acquire and use data at unprecedented scale. When AI systems process data and provide algorithmic outcomes, these decisions are made on a scale which cannot be matched by humans. At one hand, this means more public service requests can be handled, at scale and speed which was not previously possible. On the other hand, it also means if an AI system is biased, providing harmful outcomes or being intentionally used for discriminatory purposes, the results will impact a larger portion of the society. When an application or a request is received, a human public employee reviews the case and decides one by one, whereas an AI system can review the data points from thousands of cases and apply the rules encoded in the system to thousands of applications in a matter of seconds and minutes. Therefore, if there is a value misalignment in the code, or if the case is nuanced and requires contextual information, the speed and scale of harm will also be a lot greater than that of a human review. For example, Michigan’s MiDAS system (Michigan Integrated Data Automated System) was introduced in 2013 to detect fraudulent applications for unemployment benefits, and thus reduce state’s benefits spending. The system built by private vendors upon request resulted in wrongfully accusing more than 40,000 individuals of unemployment fraud. Not only public was not informed about how the system worked, but state provided minimal resources to answer questions about false accusations. Thousands saw lost their houses, filed for bankruptcy, or had their credit scores ruined. Across US, litigation further shows how little is known about these algorithmic systems and how the algorithmic outcomes can be arbitrary (Barry v Lyon 2016; Cahoo v SAS Analytics 2019; Arkansas DHHS v Ledgerwood 2016; K.W. v. Armstrong 2015; Matter of Lederman v King 2016; Loomis v. Wisconsin 2017). The Michigan auditor general investigation later found 93 percent were falsely accused of fraud.

As these systems are deployed by one public actor after another, eventually they will be connected too. The outputs of one algorithmic system will become inputs to another. Records across health, education, labor, credit, justice systems for example will all act as a giant public database. One wrong case decision, or wrong information in an AI system will cause a domino effect of harms on those impacted by these decisions.

7 Challenges on transparency

One of the consequences of private vendors collecting data through online behavior, sensors, personal devices, applications and other technologies is data becomes available to other actors too. Such data can be used to provide more personalized services in some cases. However, it can also be utilized by some public entities where the entity could not collect such information directly, such as civil and criminal enforcement, monitoring, or adjudication domains. Data collected with publicly owned cameras, sensors, drones, can be combined with data collected by private actors. Such aggregation of previously unconnected databases can generate inferences (at times superfluous correlations). Inferences from these systems can then be used to allocate resources to certain geographical areas or neighborhoods, or to concentrate on certain crime activity. For example, The U.S. Immigration and Customs Enforcement (ICE) purchases datasets from data brokers (Biddle 2021), car telematics data (including location, speed, idle time) from car data application vendors (Brewster 2021; Newman, 2019; Talla 2019), or subscription data from utility companies (Aleazis and Haskins 2020; Faife 2021). Law enforcement can request video surveillance footage from doorbell cameras of private citizens (Lyons 2021; Priest 2021), or use facial recognition system like Clearview even though the system is deemed illegal in multiple countries (Ryan-Mosley 2021). There are fewer regulatory protections on acquisition of data through these means since public actor is not collecting data itself directly. The actor is either purchasing the already collected data or having a private vendor purchase and process the data on its behalf. Less reporting and transparency requirements and less respect for purpose limitations means individuals or public have no insight as to who is using which data for what purposes. Freedom of Information Act (FOIA) exemptions protects "trade secrets and commercial or financial information obtained from a person [that is] privileged or confidential." (Department of Justice 2004) This exemption opens the possibility for datasets within proprietary AI systems to be also exempted from FOIA requests. A recent report from Georgetown Law Center on Privacy & Technology shows how US Immigration and Customs Enforcement (ICE) accesses the personal information of hundreds of millions of Americans through private data brokers (Wang et al. 2022). AI systems which can process such large data points thus become part of crucial infrastructure. In response to the requirements of Executive Order 13,960, federal agencies in US recently published the inventory of AI uses cases. While it is a great step for transparency, due to freedom given to each agency on how to complete an inventory and what level of detail to disclose, the inventories vary significantly. For example, in contrast to Georgetown Law Center’s report, Department of Homeland Security Immigration and Customs (ICE) discloses only four AI systems in use for its inventory. Department of Justice’s (DOJ) public AI use case inventory is limited a single page, also with only four use cases, providing single word details. Neither DOJ nor ICE disclosures include above mentioned examples of agency practices or vendors, nor do they provide any actionable information to public (NAII 2022).

For a private AI vendor, the priority is not usually the alignment of values and policies within their system and the protection of fundamental rights and rule of law. The objectives are usually creating demand for product, increasing profit and market share, and avoiding liability (a risk to the business) as much as possible in the process. Even when the public actor has an established need and is requesting vendor responses to such need through a formal procurement process, lack of institutional capability to critically analyze AI systems and use of non-AI specific procurement guidelines will result in liability for public entities. AI-specific procurement guidelines need to drive in-depth due diligence and robust processes to understand the organizational, societal, and individual impacts can reduce the knowledge and capability gap. Otherwise, a regular procurement analysis will fall short of addressing all implications specific to big data and AI systems, and the multistakeholder engagement necessary. Additionally, subcontracting arrangements can also obscure the real actors involved and diminish the usefulness of transparency. In 2012, Europol, EU’s law enforcement agency, signed an agreement with a French multinational Capgemini to create Europol Analysis System. Capgemini subcontracted the work to Palantir (European Parliament 2020). Even when Europol had issues with Palantir’s software Gotham and considered litigation, a full disentanglement was not possible.

Increasingly the public entities are engaging with private data or AI system vendors in more direct and non-transparent ways. When a private vendor interacts with a public entity, the quickest and easiest point of entry is preferred over a public discussion on what their AI system might mean for the community or society. For example, there might be times of crisis when a need for a quick action and solution might be used as an excuse to skip the regular procurement process and obligations. In UK, NHS onboarded Palantir in March 2020 to help develop NHS Covid-19 Data Store, with a no-bid contract valued at £1 between NHS and Palantir. The contract was awarded using what is called the G-Cloud 11 Framework, an accelerated procurement system for minor contracts and does not require a tender to be published. The contract was only revealed after questions from data privacy activists. It is still not clear if impact assessments have been conducted. The cost of continuing with Palantir, however, was clear when contract was extended at £23.5 m at the end of the trial period. In Greece, another zero-cost agreement between Palantir and Greek government was revealed. The agreement which gave Palantir access to vast amounts of health data to help manage COVID-19 crisis was not registered in public procurement system, nor was a mandated data impact assessment was conducted (Black, 2021; Howden et al. 2021).

Alternatively, a vendor might have direct engagement with senior decision-makers or might supply its products for free or discounted prices to the public entity to slowly build the need for the product. Instead of providing a solution for the public actor’s established need, such engagements mean that a need is created for a product. Direct engagement and entry point means skipping several layers of stakeholders, internally and externally, who should have been involved in the process. One such case was Palantir gifting its predictive policing system to New Orleans. A secretive arrangement between the company and the mayor, combined with some unilateral powers of the executive, allowed for the system to be implemented and used without public knowledge. The agreement was also unknown to many officials, and it never had to pass through a public procurement process, which would have required public debate and the signoff of the city council (Winston 2018).

The incentive for the AI vendor, in such cases, is the ability to collect data, or train its AI models, or use the organization as a reference for a further sale, or simply to establish itself within the organization (Laperruque 2017) for a prolonged contract. The more connected an AI system becomes within the organization, the harder it becomes to decouple it later. Palantir is used as an example for multiple angles of the same question in this paper. The vendor is transparent in suggesting ‘The systemic failures of government institutions to provide for the public will continue to require both the public and private sectors to transform themselves’ and it wants to become ‘the central operating system not only for individual institutions but for entire industries.’ (Palantir 2020). However, the company is by no means the only example where theoretical concerns about AI systems in the public domain turn into reality. As Marietje Schaake, the director of Stanford’s Cyber Policy Centre, warns “We’re building a software house of cards which is sold as a service to the public but can be a liability to society. There’s an asymmetry of knowledge and power and accountability, a question of what we’re able to know in the public interest. Private power over public processes is growing exponentially with access to data and talent.” (Howden et al. 2021).

Public disclosures are recommended to enhance transparency. However, as detailed above, they are not always available. Effective oversight and enforcement mechanisms are crucial to enforce transparency and shed light on the actions of public actors. However, we also need to treat such transparency as a means to an end. While very useful in its own right, the focus on technical parts and outcomes misses an understanding of the social elements (Wieringa 2020). We also need to deliberate values and choices, enforce responsibility and accountability.

8 Challenges for accountability

When a private vendor is engaged to provide public services or access to public services, two issues emerge. First, such contracting means the public entities, intentionally or unintentionally, transfer some of their responsibility to private company. Such public entity which should carry a higher duty of care outsources its services to a profit-driven entity through AI systems, except some of the obligations to the public disappear in the transfer. Mulligan and Bamberger refer to “procurement as policy” whereby the algorithmic systems “frequently displace discretion previously held by either policymakers charged with ordering that discretion, or individual front-end government employees on whose judgment governments previously relied…When the adoption of those systems is governed by procurement, the policies they embed receive little or no agency or outside expertise beyond that provided by the vendor: no public participation, no reasoned deliberation, and no factual record. Design decisions are left to private third-party developers. Government responsibility for policymaking is abdicated.” (Mulligan and Bamberger 2019). Through procurement conditions and contractual arrangements, a public entity can ensure the vendor carries responsibility and liability for system outcomes or malfunction. However, this still leaves the vendor only answering to the public entity and leaves the affected individuals and communities having to deal with private entities. In cases where a vendor does not have competition in the market, it can also use its power to deflect any accountability and liability if a harm occurs. In 2021, Internal Revenue Service (IRS) signed an $86 million contract with ID.me to provide biometric identity verification services. The arrangement required taxpayers submit their biometrics in the form of a selfie to authenticate their identity. ID.me claims to already serve 27 states and multiple federal agencies (Rappeport and Hill 2022). If the service does not perform equally and equitably across different demographics due to skin tone, age or gender, a taxpayer might be penalized for the error. National Institute for Standards and Technology observes that rates of false positives for Asian and African American faces relative to images of Caucasians can range from a factor of 10 to 100 times (NIST 2019). Alternatively, if ID.me databases are breached, the taxpayer might be subject to identity theft at highest level since one cannot change their biometric identifiers (Buolamwini 2022). Although this arrangement between IRS and ID.me was put on hold after advocacy groups pushed back, the vendor still has contracts across multiple jurisdictions as a public entity partner to verify unemployment insurance applications and still impacting millions of individuals (ACLU 2022a; Metz 2021).

The second emerging issue is the ability of the private vendors to hide behind IP protections. In the absence of a regulation or a contractual requirement which mandates disclosure, a private company does not have any incentive to share its design decisions or code with any actor. This makes it impossible to analyze how these systems work, audit their validity, reliability, or accuracy, or have an ongoing debate about whether they should be in use. Busuioc, analyzing limitations of algorithmic systems and the implications such limitations pose for public accountability, calls on the emerging accountability gap. Busuioc, referring to Pasquale’s work, highlights how he traces ‘a shift in this context from “legitimacy‐via‐transparency to reassurance‐via‐secrecy”” (Busuioc 2021; Pasquale 2011).

As Moss et al. argue even “voluntary commitments to auditing and transparency do not constitute accountability… [as] they do not meet the standard of accountability to an external forum (Moss et al. 2021).” Currently, most of the public entities are not subject to any governance mechanisms which requires a transparent internal and external management of all AI systems used by an entity. Although several policy examples are emerging globally (Central Digital and Data Office 2021; City of Amsterdam 2020; City of New York 2020; Executive Order 2020; Government of Canada 2020; Government of New Zealand 2020; L’Assemblée nationale, 2016; Seattle, 2017; UK Office for AI 2020) in most cases even a public entity itself does not have a full picture of its entanglement with a private AI vendor. For example, even a city level law enforcement entity may not know exactly all the systems used across its different departments, how data is integrated, how the outcomes are shaping their practices and policy. This makes it hard for public and civil society to engage with the right partners, find information and hold anyone accountable. In his definition of accountability, Bovens requires five integral parts: (1) an actor, (2) a forum, (3) a relationship between the two, in which the (4)actor is obliged to explain and justify its conduct, the forum can pose questions and pass judgement, and the actor might face (5) consequences (Bovens 2007). In a situation where the actor(s) cannot be properly assigned due to distributed and transferred responsibilities, and vendors are not obliged to explain the behavior of their AI systems, it becomes extremely hard to assign any accountability and consequences when AI systems harm individuals or groups, or infringe upon human rights. In their 2017 article, Kroll et al. write “accountability mechanisms and legal standards that govern decision processes have not kept pace with technology” (Kroll et al. 2017).

9 Limitations and future research directions

There are several limitations to this research work. The first one refers to the information availability and asymmetry. The research is limited to publicly available documents such as academic literature, government reports and registries, investigative journalism, litigation text, and private discussions with practitioners. Both the public actor procuring the AI systems and the vendor developing an AI system currently contribute to the unavailability of easily accessible information. The public actor may have an interest in keeping the details of its intelligence or enforcement systems behind a wall of protections. This interest might drive from legitimate concerns about counter actions and possibility for malicious actors to game the system (Veale et al. 2018). Alternatively, the agency itself might not have access to proprietary algorithms due to prior contractual commitments, or current exemptions in procurement regulation. The vendor, on the other hand, contributes to the information asymmetry by benefiting from legal protections for trade secrets (Katyal 2019). The vendor might also be concerned about liability or an employee backlash it might receive if the details of its cooperation with government was to become public (Campbell 2018; Shane and Wakabayashi 2018).

A different set of limitations relate to reproducibility and replicability of the outcomes of AI systems. Even with full access to these machine-learning systems and technical literacy, it might still be impossible to trace back a particular decision of the AI system and reproduce the exact same result. This creates a situation where an individual whose rights are infringed (or an entity acting on behalf of the individual) may not be able to trace back or replicate the discriminatory decisions.

Another limitation refers to the incentives embedded into the design, development, procurement, and implementation of AI systems. Both the public actor and the vendor can state what problem(s) they are solving with the AI system. However, such public statements or disclosures usually do not include the organizational incentives impacting decisions. Developers or procurement officials may be incentivized to complete due diligence in less time, spending less resources, or in ways possibly contradicting with responsible design and development, or in-depth due diligence.

This research mapped out the challenges in procurement of AI systems by public entities and the long-term implications necessitating AI-specific procurement guidelines and processes. Future research can provide an analysis of benefits and limitations of transparency, especially in the form of public disclosures. How do public disclosures contribute to the governance of AI systems? What are limitations of disclosures? Can emerging technology be used in new ways to contribute to meaningful participation of society in the debates impacted fundamental rights and due process? This kind of inquiry and in-depth analysis can be replicated across many jurisdictions globally as every country has different procurement regulations, infrastructure and governance mechanisms.

In any procurement environment, humans ultimately conduct the due diligence and review the available documents and make judgements. Any transparency method, whether in the form of disclosures, datasheets, explainability reports, or notices needs to be understood accurately by its audience. This means capacity, capability, and perceptions of public procurement officials play a crucial role. A robust literature analyzes human biases, communication approaches and requirements for different types of explanations of AI systems. However, future research can also focus on the sense-making and perception of public officials operating within the confines of government bureaucracy and politics and how they judge current AI principles and the social value of the systems they are involved in procuring.

10 Conclusion: recommendations for public agencies

As this paper mapped out the challenges and risks in procurement of AI systems by public entities and the long-term implications, recommendations for what an AI-specific public procurement guideline and process should include is also necessary.

The issue at hand is not a single AI system or device, but the whole ecosystem. AI-specific procurement guidelines and governance must be applicable to devices used by the public entity and its agents; to externally collected and acquired data; to the AI software fed by these datasets; to AI platforms which connect disparate software; to the cloud-based hosting infrastructures. Already public agencies are moving their data and communications infrastructures to cloud-based hosting systems. These systems are owned by a handful of major technology companies. This creates an inevitable dependency on private vendors. Public sector will not be able to sustain its own infrastructure. If not governed with public interest and fundamental rights in mind, this eventual entanglement will mean some vendors will become too big to fail. They will be too powerful and will set the terms of the engagement. The situation becomes more concerning when vendors are involved in very high-stake decisions like law enforcement, border management, intelligence, or health and benefit systems. Even if the public entity is interested in severing its relationship, as exampled in the Europol Analysis System case (European Parliament 2020), or if the system is not working as expected, the entity might not be able to easily terminate its contract and disentangle itself from the relationship.

If private AI systems are deployed within public sector, human rights rule of law, and commitment to principles of fairness, accountability, and transparency must be required. Otherwise, public actors will have embedded systems without an independent capability to maintain these systems, or skills to monitor system’s performance. Alternative oversight and accountability mechanisms will also not be available due to the initial lack of transparency or subcontracting arrangements. A note of caution is necessary here. Most of the issues explained above about corporate AI systems is also applicable to systems built inhouse by public entities. These systems are still sociotechnical systems. The motives and values of the developers and the institution will still be embedded in these AI systems. The need for governance mechanisms and accountability structures is still there. Therefore, the solution to the corporate entanglement and dependency is not building these systems internally. Obligations and documentation within an AI-specific procurement process must be applicable for both external and in-house development cases. A recent case in point was a lawsuit claiming Immigration and Customs Enforcements (ICE) created a “secret no-release policy” and manipulated the risk assessment algorithm to recommend only one decision. Velesaca v. Decker case challenged the automatic and indefinite incarceration virtually all of the thousands of people ICE arrested between 2017 and 2020 for alleged immigration offenses. The algorithm used to recommend an arrestee be released or detained until a hearing was changed in 2015 and again in 2017, removing the ability to recommend release, even for arrestees who posed no threat (Robertson 2020). The detainees were not subject to due process and never had any change at recourse. The settlement in the case in March 2022 secures the right to a fair release assessment for everyone arrested by ICE in New York (ACLU 2022b; Velesaca v Decker 2020). The example of how a risk profiling system forced the Dutch government to resign should be a reminder for all public entities. Systeem Risico Indicatie, SyRi, an algorithm used by Dutch government to detect possible social welfare fraud, was found to be discriminatory against people with dual nationality and low income. The authorities started claiming back benefits from families who were flagged by the system, without proof they had committed such fraud. The claims pushed tens of thousands of families to poverty and separated more than a thousand children from their families into foster care. Some victims committed suicide (Heikkila 2022). District Court of Hague found that “under article 8 of the ECHR, the Netherlands did not strike a fair balance between privacy and the benefits of the use of new technologies to prevent and combat fraud because Syri was “insufficiently clear and verifiable” (Court of Hague 2020). A parliamentary report into the childcare benefits scandal found institutional bias and authorities hiding information or misleading the Parliament about the facts (Dutch Parliament 2020). In response, Dutch parliament adopted a motion in April 2022 to make it mandatory to conduct human rights impact assessment before using algorithms when algorithms are used to make evaluations or decisions about people, and where possible, to make impact assessments public (Dutch Parliament 2022). In May 2022, The Netherlands Court of Audit found that six out of nine algorithms it audited did not meet basic requirements and exposed the government to various risks: from inadequate control over the algorithm’s performance and impact to bias, data leaks and unauthorized access (Netherlands Court of Audit 2022).

Another requirement in the public procurement process is to ensure whether an AI system is the right solution to a need or problem. We need to be aware of techno-solutionism and focus on the structural causes of an issue, not the parts of the issue we can collect data about and patch algorithmic systems over them. Such a determination must be made by engaging, internally and externally, multidisciplinary public officials and impacted communities in the decisions (Hickok, 2021). Voices of the impacted communities must be heard and respected. The obstacles preventing them from participating in such engagements must be removed. Especially for cases where a system makes determinations about a person’s life and liberty, ability to practice fundamental rights, or access to resources, impact assessments and documentation must be mandated. In parallel, public entities must engage with impacted communities and civil society in a transparent, multi-stakeholder manner which respects participation parity, to agree on AI-specific procurement guidelines, reporting and disclosure requirements.

Public must have access to relevant information in a way that facilitates meaningful engagement. In October 2021, Eric Lander and Dr Alondra Nelson, by-then White House Office of Science and Technology Policy Director and Deputy Director, stated ‘Powerful technologies should be required to respect our democratic values and abide by the central tenet that everyone should be treated fairly…country [US] should clarify the rights and freedoms we expect data-driven technologies to respect…enumerating the rights is just a first step. Possibilities include the federal government refusing to buy software or technology products that fail to respect these rights, requiring federal contractors to use technologies that adhere to this “bill of rights,” or adopting new laws and regulations to fill gaps. States might choose to adopt similar practices.’ (Lander and Nelson 2021). In the same way, where decisions have serious implications for individuals, algorithms can neither be secret (proprietary) nor uninterpretable (Busuioc 2021; Rudin 2019). AI systems are developed by humans; however, these systems are usually mistakenly perceived as independent, objective, unquestionable technologies. Therefore, the outcomes of these systems should not be used as substitute to other steps in a due process. Both the public and private actors must be held accountable for decisions and outcomes. Procurement, development, and implementation must be subject to robust governance and enforcement mechanisms. These mechanisms necessitate both initial internal capacity building and an ongoing capacity enhancement as the science and technology advances. Data generated, collected, processed, and used by humans can never be bias free. An appreciation of such fact, an understanding of the risks specific to AI systems and their socio-technical aspects should make public actors pay even more attention to due diligence. Procurement regulations should be updated to include obligations for the developers to share details of data qualities, model decision decisions, optimization techniques and processes when required by the public entity. Additionally, procurement guidelines should require a capable internal workforce to be in place before a procurement decision is made for an algorithmic system.

A functioning democracy in which both fundamental rights and rule of law are prioritized, society first needs and agreement, a social contract, on what kind of systems should be allowed or banned. As Gabriela Ramos, Assistant Director-General for the Social and Human Sciences of UNESCO, suggests ‘AI technologies can be used to strengthen government accountability and can produce many benefits for democratic action, participation, and pluralism, making democracy more direct and responsive. However, [such technologies] can also be used to strengthen repressive capabilities and for manipulation purposes’ (Ramos 2022). An engaging public debate and discourse should result in a basic agreement about which systems should be prioritized and which systems should never be implemented.