Skip to content
Publicly Available Published by De Gruyter November 7, 2019

Political Machines: Ethical Governance in the Age of AI

  • Fiona J. McEvoy EMAIL logo

Abstract

Policymakers are responsible for key decisions about political governance. Usually, they are selected or elected based on experience and then supported in their decision-making by the additional counsel of subject experts. Those satisfied with this system believe these individuals – generally speaking – will have the right intuitions about the best types of action. This is important because political decisions have ethical implications; they affect how we all live in society. Nevertheless, there is a wealth of research that cautions against trusting human judgment as it can be severely flawed. This paper will look at the root causes of the most common errors of human judgment before arguing – contra the instincts of many – that future AI systems could take a range of political decisions more reliably. I will argue that, if/when engineers establish ethically robust systems, governments will have a moral obligation to refer to them as a part of decision-making.

1 Moral Obligation

There isn’t the space in this brief paper, nor is there a need, to give an overview of all systems of governance. Let us agree that they vary enormously by jurisdiction, but that there is one common unifier: all governments make important decisions that affect their populations (I deliberately do not use the term ‘electorate’ as this is not always the case). Therefore, regardless of the shape of a government, the decisions taken by politicians and policymakers should be carefully considered, given how much is at stake for the general population.

I want to begin this discussion with the idea that all political decision-takers have a baseline moral obligation to make the best decisions they can within the parameters of their abilities and resources. This obligation could be generated by the intrinsic promissory duty any public official assumes upon taking up their position, or issue from a broader utilitarian standpoint that demands governments consistently promote the greater good.

Making the case for a form of epistocracy, a knowledge-based rule that privileges the best decisions and outcomes over how they come about (Linares Lejarraga 2017, p. 248), Jason Brennan makes a useful argument that it is ‘presumptively unjust’ to accept incompetent decision-making when better ways are available. He forwards what he calls the ‘competence principle’, and its corollary:

The competence principle: It is presumed to be unjust and to violate a citizen’s rights to forcibly deprive them of life, liberty, or property, or significantly harm their life prospects, as a result of decisions made in an incompetent way or in bad faith. Political decisions are presumed legitimate and authoritative only when produced by competent political bodies in a competent way or in good faith… or presumptively, we ought to replace an incompetent political decision-making method with a competent one. (Brennan 2016, p. 156)

Note that there is no requirement for governments to divert time, energy, or large budgets in pursuit of decision-making perfection. Rather, the obligation calls for trusted agents to choose the most effective available ways of taking political decisions without expending unreasonable effort.

We might recognize the intrinsic rightness of this not just from its consequences, but in its own nature (Ross 1930, p. 47), and yet even political decision-makers who acknowledge an obligation continue to act on uncertain judgments that result in poor outcomes. In varying degrees, bad decisions have caused wars, marginalized whole communities, persecuted individuals, allowed nepotism to flourish, and left many people below the poverty line. For the most part, these destructive decisions have issued from well-intentioned motivations (acknowledging that some citizens are badly affected by systemic corruption within their governments). This is why I purposefully extend this moral obligation argument to ensure that making the best decision is not limited to ‘making best efforts’, but also actively includes choosing the ‘best methods.’ By this, I mean that where provably better ways of taking decisions are available, politicians and policymakers have a moral obligation to adopt those ways, unless there is an exceptional or overriding reason not to.

In practice, this means that if there were some deciding mechanism – such as an artificial intelligence (AI) system – which could provide proof of its superiority, then we should refer (if not defer) to its worthy suggestions. Of course, this is an extremely unpalatable idea to many who feel that important, nuanced judgments effecting humans should remain the domain of humans (e. g. Dawes 1979; Meehl 1986; Klein 1999; Norman 2014). Machines, it is often argued, are only for the ‘dull, dirty and dangerous’ tasks that people wish to avoid.

Lastly, I wish to stress that my focus on political governance should not imply a belief that political decision-makers are alone in making morally salient decisions effecting broad populations. Indeed, it is clear that the actions of a great many private businesses should be subject to comparable scrutiny. Rather, I have chosen political decision-making as my focus for three other reasons; first, because the examination could become unwieldy if not confined to a closed domain; second, political decisions tend to be more complex than business decisions (Guthrie Weissman 2017) [1]; third, though I regard it as axiomatic that large corporations effect and sometimes control the fortunes of mass audiences, their latitude is still less than that of governments taken as a whole, and therefore the stakes of political decision-making appear to be demonstrably higher.

2 Flaws of the intuition

Many psychologists have endeavored to expose the flaws of human intuition. Here I borrow some that are relevant to this discussion of political decision-making. Critically, my objective is not to provide a critique of this work. Nor do I wish to present each example as stand-alone evidence of human unfitness in decision-making. Instead, I hope to demonstrate simpliciter, through compiling a small selection from a huge cannon of corroborative work, that there is reasonable cause to mistrust judgments based on human intuition.

2.1 Ego depletion

Experiments have shown conclusively that all variants of voluntary effort – cognitive, emotional, or physical – draw at least in part on a shared pool of mental energy (Kahneman 2011, p. 41). Typically, after exerting ourselves in one task, ego depletion means that we do not feel like making an effort in the next one (although this can be overridden with a strong incentive).

One well-known report (Danziger et al. 2011, pp. 6889ff.) demonstrated worrying depletion affects amongst a group of eight Israeli judges reviewing applications for parole. Despite the default decision for such an application being denial – only 35 % are generally approved – experimenters found that after meal breaks the number of requests granted leapt to 65 %. They also dropped steadily during the two hours before the judges’ next meal. The implication being that tired and hungry judges may fall back on default positions.

2.2 The halo effect

The sequence in which we observe people is determined by chance, and yet the ‘halo effect’ (Thorndike 1920, pp. 25ff.) weights first impressions so heavily that later information can be completely wasted. Asch (1946, p. 272) asks what the reader thinks of Alan and Ben?

Alan: intelligent – industrious – impulsive – critical – stubborn – envious

Ben: envious – stubborn – critical – impulsive – industrious – intelligent

Most view Alan more favorably, though they both have identical traits. The stubbornness of an intelligent person may be seen as justified and educe respect, but intelligence in an envious and stubborn person makes him dangerous (Kahneman 2011, p. 82). We are naturally attracted to coherence and convincing stories, and thus interpret each later trait in the context of the earlier ones.

Researchers have also found that the same halo effect can compel us to find physically attractive people more likeable (William Lucker, Beane, and Helmreich 1981, pp. 69ff.), as well are more politically knowledgeable and persuasive (Palmer and Peterson 2016, pp. 353ff.) because of the increased weight of first impressions.

2.3 The affect heuristic

Humans can sometimes give primacy to conclusions when our emotions are involved. Psychologist Paul Slovic termed this the affect heuristic, in which our existing likes and preferences (e. g. political allegiance) determine our beliefs about everything else. Moreover, if we dislike something passionately, then we are likely to believe that its risks are higher, and its benefits are negligible (Finucane 2000).

Slovic ran an experiment (1999, pp. 689ff.) which surveyed opinions on a number of technologies, asking participants to list the pros and cons of each. He found that when someone approved of a technology they perceived minimal risks and large benefits. When they disliked it, they struggled to find any advantages.

However, it should be noted that when respondents were shown positive passages about the technologies they initially disliked – either describing ample benefits or low risks – the emotional appeal also changed. Those who learned about the benefits also changed their minds about the risks, despite receiving no further information about the latter. Similarly, those who learned that the risks of a technology were mild, also developed a more favorable view of its benefits, simply as a byproduct.

2.4 Belief overkill

Baron (2009)[2] demonstrated a similar cognitive phenomenon with online opinion polling data about hypothetical political candidates.

Upon making the initial observation that many US Republican Party voters were originally attracted to the Party for its stance on social issues (e. g. abortion) but came to espouse the party’s economic positions too, Baron set out to prove what he termed ‘belief overkill’ – the idea that when some decision is salient, once a voter moves towards an option they bring all of their other beliefs into line so that none of their beliefs oppose the favored option.

In his study, Baron presented subjects with two separate political positions which were attributed to same political candidate and asked for an overall judgment of that candidate as well as a judgment about how each position affected the candidate’s attractiveness. Confirming Baron’s hypothesis, the subject’s appraisal of one issue predicted their evaluation of the other and of the candidate. He concluded that our political beliefs seem to come in ‘pre-packaged’ bundles.

2.5 The law of small numbers

We are prone to exaggerate the consistency and coherence of what we see, even on limited evidence. This means we tend to concentrate on the content of any given message, rather than its reliability (Kahneman 2011, p. 118). Such a flaw is particularly problematic when it comes to small samples of information, as they often give more extreme and misleading results than larger samples.

Wainer and Zwerling (2006, pp. 300ff.) give the example of a large investment made by the Gates Foundation on the basis of research into the most successful schools. The study, which looked at 1,662 schools in Pennsylvania, found that the best schools were usually small in size. This prompted the foundation to plunge $1.7 billion into the creation of more small schools (even splitting some existing schools in the process). They were joined in their enthusiasm by at least half a dozen other agencies and a US government department.

Though this move seems reasonable – small schools should be able to give children more time, attention and encouragement – the causal analysis doesn’t quite work. Had researchers also looked at the characteristics of poor performing schools, they would’ve found they also tended to be small. Small schools are not better, they are just more variable, but cognitively we wish to simplify the findings and make the world more intelligible than the data justify.

2.6 Availability cascade

An availability cascade is the mechanism by which policies become infused with biases and the allocation of resources can be distorted (Kuran and Sunstein 1999, p. 683). It refers to a chain of events, often beginning from media reports of a relatively minor event, which escalate into public panic and government action. In these cases, availability provides a heuristic, which leads to judgments based on fluency and/or emotional charge when ideas come to mind.

In the late 1980s the chemical Alar was sprayed on apples to regulate their growth and improve their appearance. Press stories then emerged about tumors occurring in rodents that had consumed huge amounts of the chemical. A frightened public reacted, which in turn fueled more media. Apples became objects of fear, Meryl Streep gave testimony before Congress, the manufacturer withdrew the product and the FDA banned it (Kahneman 2011, p. 143). Though subsequent research confirmed the substance could pose a very small risk as a mild carcinogen, this was still an overreaction to a minor problem. The phenomenon has been called ‘probability neglect’ (Sunstein 2002, p.63), an over-attendance to a minor risk due to the ease and frequency with which it comes to mind.

3 Political judgment: Other factors driving poor decision-making

In addition to the cognitive flaws outlined above, which are relevant to all types of human decision-maker, there is also strong supporting evidence that subject-specific experts frequently fall short on their judgments, particularly when it comes to political forecasting and appropriate policy-making.

Philip Tetlock tested political judgment-makers on their ability to predict outcomes in a way that was a) empirically accurate in terms of how their judgments mapped onto the publicly observable world, and b) logical and internally coherent (i. e. not just a lucky guess). Tetlock found that as the level of expertise rose, confidence also rose – but not accuracy (2017, ch. 3). Repeatedly, experts attached high probability to low frequency events in error, relying upon intuitive causal reasoning rather than probabilistic reasoning. Their assertions were often no more reliable than the apocryphal ‘dart throwing chimp’ (2017, p. 80).

Shu, Tsay, and Bazerman (2012, pp. 243ff.) similarly present evidence to counter the myth that policymakers are rational actors. They aggregated research to show that ‘when considering issues of high complexity, decision makers are typically constrained by time, imperfect knowledge, and overreliance on general rules of thumb.’ Their supporting argument cites, among other factors, the prevalence of loss aversion which supports a tendency to focus on losses rather than gains when contemplating change. This ultimately translates into a greater concern with the risk of change than the risk of failing to change. Shu et al. say this status quo bias can lead to a dysfunctional desire to maintain a broken system.

Emotion is another factor that holds sway over good political judgment. Gordon and Arian (2001, pp. 196ff.) present data showing that feelings of threat correlate with policy choices regarding the threatening situation or group, and often at very strong levels. Specifically, the more threatened people feel, the more their policy choice tends to maintain or intensify the conflict – that is, the more incendiary the policy choice is – and vice versa – the lower the threat the more subdued the policy choice is. They conclude that emotions can undermine logical or rational considerations in some circumstances.

Lastly, there are also a number of more conscious factors that can derail the good decision-making of political figures and policy experts. Namely, their concern with acceptability and public approval which comes to distinguish them from analytical decision-takers whose behavior is driven by the need to maximize utility in general (Farnham 1990, p. 99).

4 Political machines

By now it should be clear that human – and specifically political – decision-making is sometimes an ambiguous undertaking, subject to biasing influences and an overreliance on unhelpful heuristics. I propose that one of the ways to remove these inconsistencies and subjective judgments from the (human-centric) political system, is by employing data-fueled artificial intelligence as a chief consultant in all governmental judgments. As stated, I also want to advocate that where provably competent technology is available to civic decision-takers, they have a moral obligation to use it in order to make better decisions for the good of society.

4.1 Arguments for AI

Writing for Wired magazine, Joshua Davis (2017) made a (tongue-in-cheek) case for an ‘AI president’. As his main support, he cites the superiority of computational judgments over human ones, which – as we’ve seen – can be problematic. Of late, he argues, advanced AI has demonstrated that it can cut through complexity even more easily than some of the most sophisticated human minds:

Over the past 12 months, an AI built by Google has won 60 games in a row against the world’s best Go players. To do this, it had to master a game that is far more complex than chess. (There are more possible Go games than there are atoms in the known universe). The AI faced a huge array of choices and had to think dozens of steps ahead. It needed to make difficult decisions, fashion a strategy that involved risk and operate with incomplete information. It did all this, and it also innovated. “It won by doing things we hadn’t seen before,” says Myungwan Kim, a professional Go player. “We thought it would take 50 years for software to beat the top players in the world but, over five months, this program became the best player in the world.”

Davis pondered further:

Imagine an AI presidency in 2003. The software would have analyzed decades’ worth of reports on Saddam Hussein, absorbed the intelligence about WMDs, and concluded that an invasion of Iraq was obviously a dumb idea and unlikely to spread democracy. Ditto Vietnam. [3]

As Davis acknowledges, the complexities of the world cannot fairly be compared to a game of Go. Nevertheless, clearly there are good reasons to favor AI over human judgment. I believe that these reasons become even stronger when we consider the potential ramifications of political judgments; this is what leads me to insist that AI systems should be an integrated part of governmental decision-making.

The first argument for AI has already been made. Unlike human judgment and cognition, decision-making systems are not at risk of the halo effect, ego depletion, powerful emotion, or the other challenges outlined above. AI systems do not get tired, suffer from low blood sugar, or let subjective personal beliefs warp their perspectives on important issues. In this sense at least, they seem to be objective.

The second reason to prefer AI, which perhaps flows naturally from the first, is because artificial intelligence is often extremely accurate in its predictions and judgments. Where it outperforms humans, it is usually by some considerable margin. This accuracy is not even unique to AI or machine learning. It is simply the case that statistical predictions tend to triumph over those of human experts. This has long been known. The famous comparison between clinical predictions and statistical predictions, first noted by Meehl, demonstrates well the preeminence of the latter (1954). Kahneman writes that even after 50 years, and roughly 200 studies, there are still no convincingly documented examples of human intuitions besting algorithmic forecasts in this area (2011, p. 223).

Tetlock’s experiments with political experts also found that it was impossible to find any domain in which humans clearly outperformed even crude extrapolation algorithms, less still sophisticated statistical ones (2017, p. 54). He asserted that – unlike algorithms – humans experts dig themselves into intellectual holes, basing the probability of something happening on nothing more than their worldview. Interestingly, Tetlock describes a human dilemma which presupposes an advantage for data-driven AI, explaining with regard to human judgment: ‘if we only accept evidence that conforms to our worldview, we will become prisoners of our preconceptions, but if we subject all evidence, agreeable or disagreeable, to the same scrutiny, we will be overwhelmed’ (2017, p. 19). Of course, our superfast processors now can examine all available evidence and – just like the type of pundits he finds to have better political judgment (‘foxes’) – this means their survey of the world could be more balanced. Furthermore, just like foxes, AI is a ‘belief updater’ by its very nature.

Many years on from Meehl’s early discovery about statistics, AI now deploys automated algorithms to supplant human judgment in a growing number of domains. Recently, Stanford University created a machine learning algorithm which can identify lung tissue slides exhibiting a specific type of cancer with a far greater accuracy than human pathologists, two of whom will only agree 60 % of the time (Yu et al. 2016, p. 2). Meanwhile, Northwestern University developed a model that performs as well or better than 75 % of Americans on a standard intelligence test (Lovett and Forbus 2017, p. 77). Even the CIA has begun to employ ‘anticipatory intelligence’, which in some cases allows it to detect social unrest from open datasets as far as three to five days out (Russan 2016). [4]

As I’ve already touched upon, another key advantage of artificial intelligence is the extraordinary speed at which it is capable of processing material. A video-game playing ‘bot’ recently completed 19,000 actions per minute (Kim and Lee 2017). [5] This speed is critical, given that AI systems can now draw upon information that is infinitely broader in range and more complex in detail than a human could ever reasonably make sense of (thanks largely to the increasing ‘datafication’ of our world). The combination of speedy technology and the avalanche of so-called ‘big data’ means that computational, algorithmic systems can usually outpace human analysis of the available information cues. It is precisely this speed and almost unlimited scope that allows AI machine learning systems to become so accurate.

4.2 A caveat

Having noted these ‘pros’, I do want to nuance the point I am stating. I do not wish to imply that AI systems are always preferable to human decision-makers. It is important (and only fair) to note that there are scenarios where human instincts, based on experience, can be very reliable indeed. In their joint effort, ‘Conditions for Intuitive Expertise: A Failure to Disagree’, Kahneman and Klein (2009, pp. 515ff.), though rivals on the topic, agreed the conditions under which human judgment is very reliable:

  1. When an environment is sufficiently regular to be predictable; and

  2. When there has been an opportunity to learn these regularities through prolonged practice.

If these are satisfied, intuition is likely to be skilled. Into this category of decision-maker might fall physicians, athletes, nurses, and firefighters, all of whom face what Kahneman calls ‘complex but orderly situations’ (2011, p. 240), also known as ‘high validity environments’ where we can find stable relationships between objectively identifiable cues and subsequent events, or between cues and the outcomes of possible actions. In contrast, in ‘zero validity environments’ outcomes are unpredictable. As an example, Kahneman and Klein offer that ‘predictions of the future value of individual stocks and long-term forecasts of political events are made in a zero-validity environment’ (2009, p. 524).

There are two observations here which are decisive for my argument. First, that Kahneman and Klein do not append politicians and policymakers to this list of skilled decision-makers. The unpredictability the future scenarios they face (e. g. the fluctuation of economies, or the movements of foreign governments) are not accessible via intuition. Second, the agreement of these rival scholars that people perform significantly more poorly than algorithms in low-validity environments (2009, p. 523). Corroborating this is an analysis of 136 studies comparing the accuracy of clinical and mechanical judgments, mostly in noisy and highly complex environments (Grove et al. 2000, pp. 19ff.). Though the algorithms were often still wrong, in about half of the studies they prevailed. In the other half there was no difference.

Add to this evidence that algorithmic systems also perform extremely well in high-validity environments, and it seems fair to conclude they are as good an option as humans – and frequently better – when it comes to making judgments, including political ones.

5 Two main reasons for resistance

Many people will, of course, disagree that machine decisions should ever be given priority over human political instinct. I think that this resistance mostly takes one of two forms, both of which I’d like to address directly:

5.1 Aversion

Many of those who do not want automated decisions to become an integral part of governance do so because of a common, visceral aversion which is usually furnished with a causal story about fearing the demotion or displacement of humans as a species. A kind of unqualified moral repugnance (Kass 1997, pp. 17ff.).

This is, in itself, a bad intuition which leads many to vastly overweight the perceived threat of AI. The idea of becoming subject to rule-by-system, or even rule-by-robot (as per recent media hype) could lead us to reject perfectly reasonable technological advancements. It is a kind of ad hominem argument. It is certainly a slippery slope argument.

A study by Dietvorst et al. (2015, p. 114) demonstrated the phenomenon of what has been termed algorithmic aversion. Their work found that subjects mistrusted algorithmic predictions even when they witnessed them outperforming humans. Similarly, subjects lost confidence in algorithmic predictions much more quickly than they did in human forecasters.

Meehl points out that this baseless mistrust is not just a misguided position, but also an unethical one. He writes in defense of algorithms:

If I try to forecast something important about a college student, or criminal, or a depressed patient by inefficient rather than efficient means, meanwhile charging this person or the tax 10 times as much money as I would need to achieve greater predictive accuracy, that is not sound ethical practice. That it feels better, warmer and cuddlier to me as a predictor is a shabby excuse indeed. (1986, p. 374)

A preference for what is natural and human, is just that: a preference, and not an argument. If we have developed better methods (as is happening all the time), we should employ them. It is a moral obligation in the case of governance. Even if this means AI must encroach on our notions of human essentialism as well as our preciously guarded status professions.

More extreme long-term fears about unruly or despotic AI are fairly easily defeated. It is worth pointing out that even the most prominent voices warning against the existential threat of what might be termed true AI (e. g. scientist Stephen Hawking, Microsoft founder Bill Gates, and celebrity tech-speculator-in-chief Elon Musk), do not use this concern as a basis to resist the deployment of all intelligent systems.

In response to the type of AI aversion typical of those who fear future dystopias or the much-talked-of ‘Singularity’, I defer to Floridi (2016) who remarks:

How some nasty ultraintelligent AI will ever evolve autonomously from the computational skills required to park in a tight spot remains unclear. Climbing on top of a tree is not a small step towards the Moon; it is the end of the journey. [6]

5.2 Concern

Much worthier of close consideration is the concern that, in practice, data-driven AI decisions could be responsible for actively encoding and propagating pernicious biases while continuing to appear objective. If unfair machines were at the center of a political system, it follows that this could result in distinctly unethical action. Any potential skew could wrongfully target or marginalize citizens and communities at scale and in direct contradiction of Brennan’s competence principle.

Our current AI systems issue biased decisions for a number of reasons; sometimes they are trained on datasets that already contain bias, other times datasets are simply incomplete or unrepresentative of the categories they purport to represent. Moreover, these unthinking goal-oriented systems do not ‘care’ what harms they cause in achieving their end objectives which are, of course, determined and programmed by the necessarily biased humans. Data scientist and social activist Cathy O’Neill gives the full and disturbing detail of how data-driven systems can ruin lives in her epoch-defining book, Weapons of Math Destruction (2016).

This is a legitimate fear, but it is critical that current imperfections do not become an obstacle to all or any serious consideration of future uses. My ‘moral obligation’ argument is conditional and only obtains if we, as a society, are successful in developing ethically sensitive AI. I choose to look ahead to the idea of ‘political machines’ because the outlook for this condition is improving. AI ethicists have described a burgeoning focus on mitigating troubling effects (like bias) as ‘promising’, adding that there is growing movement seeking to develop AI that is attuned to underlying issues of fairness and equality (Campolo et al. 2017).

As noted, biased AI judgments often arise from data problems rather than the system itself, and at the time of writing measures are being taken to try to rectify this. For example, the practice of supplementing narrow datasets with synthetic data to ensure full representation of all ‘types’ in a spectrum (Zemel et al. 2013), and the creation of new de-biasing algorithms that can remove embedded gender stereotypes from collections of language data (Bolukbasi et al. 2016, pp. 4349ff.)). While such methods are still largely untested, practitioners are also trying to find new ways of soliciting error feedback through algorithmic audits and even citizen engagement where systems are used in the public domain.

Technical researchers are also in the early stages of thinking about how we might build human morality into AI (Conitzer et al. 2017). This might mean – for example – that a system used to determine whether a new rule will save money will refrain if it can also predict a bad social outcome for a minority group. This involves coding for our moral values right at the development stage of AI.

Within this scope, recent research by Vamplew et al. (2018, pp. 27ff.) considered how ethical values (both utilitarian and deontological) might sit among the competing objectives within a human-aligned AI system. Their discussion advocates for a MOMEU (multi-objective maximum expected utility) approach which explicitly allows a system designer to specify desired trade-offs between components of utility. They claim that this approach could also help avoid the risks of unconstrained maximization and the exploitation of goal-oriented systems, either by separating out AI behaviors into separate utility functions or by developing separate independent utility functions each with the same aim. This research builds on the multi-objective approach developed by Keeney (1973, 1988, 1996). As far back as 1971, Keeney attempted to ‘systematically attack’ a public project – an airport – in Mexico City, and used his (non-automated) technique to balance competing factors of geography, political context, finances, and issues like traffic, displacement and noise (1973, pp. 101ff.).

Beyond mere technical mitigation, there are also calls for greater diversity with regards to precisely who creates AI models, given that those who design, develop and maintain AI systems will shape them in-line with their own understanding of the world (Campolo et al. 2017, p. 37f.). Campaigners and experts are also pushing for a much better understanding of the provenance of datasets – including all embedded socio-political salience – through interdisciplinary collaboration. The same cohort argue that public agencies should not be allowed to use ‘black box’ AI or algorithmic systems, and that any system used should be subject to ongoing tests (including public auditing) and constant monitoring across different contexts and communities (2017, pp. 2ff.).

In sum, many minds – scientific, ethical, sociological, philosophical – are now engaged in developing more moral, fair, diverse, and transparent AI. If they are successful in producing a decision-making system which is at once accurate and ethical, then this concerned resistance will presumably give way to vigilant operational scrutiny.

6 Accountability objection

Though we may improve AI, we are unlikely to perfect it. Consequently, AI governance mechanisms will still occasionally deliver poor outcomes. When they do, it is impossible for those hurt to have the justice that comes from holding someone – a human person – directly accountable for the damage. Thus, a critic may argue that algorithmic governance eliminates a critical part of democracy. I want to deal with this objection in two parts.

Firstly, not all poor outcomes are the result of bad decision-taking (just as not all good outcomes are the fruit of well-taken decisions). Some factors are unforeseeable, and we should not expect even an ethical AI to build a utopia for us. Things will still go awry, and I am merely arguing that we can reasonably expect things to go awry less often, which is a prize worth wanting.

Secondly, we should consider what we wish to gain from accountability. Arguments are usually retributivist (we want to see someone punished for a harmful action) or utilitarian (we want to deter others from committing the same type of harm). The first is, arguably, rather less productive than the second when it comes to the betterment of society. Though we might relish the idea of an unpopular political character being pilloried in the court of public opinion for their bad choices, it does little to improve the lives of citizens.

It feels absurd to make an argument that there will ever be a time when we can deliver a meaningful punishment to an AI system, but this does not mean that we cannot hold them accountable in the second, more utilitarian way. A bad outcome is good error feedback for a system, it allows it to make adjustments and – critically – for all other similar systems to dynamically re-adjust so the same mistake isn’t made again. Is this not more than we can say for human politicians or policymakers, who are known for repeating the mistakes of the past? If, in the future, one error allowed us to recalibrate an entire parliament or government department, then this would surely be an example of accountability on a grand scale.

7 A further difficulty

Before concluding, I would like to deal with one further difficulty that could preclude the use of AI systems of governance; namely, the problem of agreeing the best outcome. Simply stated, even if political leaders were to run the most sophisticated and ethically flawless systems, the issuing deed would still be contentious. To a high degree, all political decisions are unprecedented, and whether the output decision designates a foreign policy approach or decides on the distribution of government resources, there are inherent difficulties in proving the eventual outcome was ‘the best’ or better than proposed alternatives.

7.1 Determining ‘the best’ outcome

In his assessment of political judgment, Tetlock acknowledges that if it were easy to set standards for judging judgment that would be honored across the opinion spectrum someone would have patented the process long ago (2017, p. 1). Political judgments are inarguably among the most controversial of them all, and yet that does not mean we cannot assess their validity. Like Tetlock, I wish invoke Lewis (1973) discussion of counterfactuals and their use in a method that helps remove doubt and establish that a political decision resulted in ‘the best’ of all possible outcomes. Tetlock argues that we have a warrant to think of a policy as great when we can only think of ways that it would’ve worked out for the worse, and there is already a substantial corpus of literature dedicated to exploring these kinds of proofs (Tetlock and Belkin 1996, p. 4). I venture that AI developers could use similar technology to evaluate decisions ex post.

Typical counterfactual arguments look for a causal link between an antecedent and consequent by constructing alternative causal paths and imagining ‘what could have been’. Lars-Erik Cederman considers how this could be applied in hindsight to moments in history and made the subject of computer simulations able to randomize or manipulate many variables to replicate real world conditions in a series of complex counterfactuals (1996, pp. 247ff.). From there users could winnow out which combination of influencing factors give rise to ‘the best’ possible outcome by all measures. As Cedarman argues, it is tempting but fallacious to regard actual history as the inevitable causal path. Counterfactuals could help both decision-makers and algorithms learn and amend.

This type of evaluation is undoubtedly useful, but in the end, it still requires that we collectively agree which of these possible other worlds is objectively better. In many cases, the advantage will be obvious. War or peace. Improving schools or badly performing schools. Higher or lower employment figures. However, others will be less convincing and the counterfactuals themselves will cause contention. For example, imagine an instruction to forge closer ties with a previously hostile foreign government; this could be viewed a) as a positive move towards peace and good for trade, or b) a negative and dangerous allegiance that could alienate other, closer international allies. Such conundrums make the problem of political contention seem intractable.

In response to this I still reject any suggestion that a counterfactual evaluation system would require a further layer of evaluation. Our requirement was only ever that AI took better decisions than those proposed by small groups of humans, not perfect decisions. If it is not easy to discern whether the system chose the best of all alternative options, then as a corollary it cannot be easily discernable that it did any meaningful damage.

8 Conclusion

I have argued that, as-and-when we see the development of an ethical AI system capable of making governance decisions, our (human) political rulers will be morally obligated to consult with it on all matters of public business. I have offered two main reasons for this:

  1. Human judgments are often compromised by a multitude of cognitive biases that are difficult to identify or counter. This creates problems for political decision-making.

  2. Artificially intelligent systems make reliably accurate judgments in low validity environments like governance.

I have attempted to counter the natural aversion many people have at the thought of AI-driven ‘leadership’, simply by pointing out that this is, in itself, a non-rational judgment. I have also acknowledged important concerns about the incorporation of values and the tempering of algorithmic bias, all of which I account for in the conditional nature of my central moral obligation statement.

I have dealt with what I take to be a reasonable accountability objection, arguing that if the ultimate purpose of accountability is to affect change for the better, then computational systems are by many measures the best instruments for the job. I have also offered a reasonable method for evaluating ‘the best’ outcome.

Finally, I wish to conclude by clarifying what I am not saying. First, I am not arguing (at this stage at least) that AI systems should usurp all human governance and be left to orchestrate countries, cities, counties, and towns. I am merely suggesting that politicians and civil servants should be obliged to consult with it, take its suggestions seriously, and – perhaps most importantly – to use those suggestions as a lens through which to review their own opinions and beliefs on matters of public importance.

Secondly, I want to note that my conditional statement is biconditional, meaning that just as a new ethical AI would prompt a moral obligation on behalf of human decision-takers, this moral obligation only exists if the AI is ethical. However, I am not saying that political and governmental leaders need sit and wait for entirely ethical systems to emerge before consulting with artificial intelligence at all. So long as governments are alive to the fact that biases have a tendency to creep in, they should be free to experiment with AI as a consultative tool. I am merely saying that there is no obligation to do so, moral or otherwise, as yet.

For many, the idea of dissolving the chambers of parliaments and government departments is a very tempting one. Perhaps one-day level-headed AI may even lead a rational government to acquiesce to a policy that sees this happen. However, for the most part those who live under democracy feel an attachment to it and its machinations. In his final words to the UK parliament, former-Prime Minister Tony Blair addressed the chamber:

Some may belittle politics, but we know it is where people stand tall. And although I know it has its many harsh contentions, it is still the arena which sets the heart beating fast. It may sometimes be a place of low skullduggery but it is more often a place for more noble causes. (HC 2007)

It would be hard to detach this nobility from its human protagonists, and perhaps more difficult still to transpose it onto a computational system. Here, I am proposing neither, just that the concern the former might be better guided by the latter in the future.

References

Asch, S.E. (1946). ‘Forming Impressions of Personality’, The Journal of Abnormal and Social Psychology 41 (3): 258.10.1037/h0055756Search in Google Scholar

Baron, J. (2009). ‘Belief Overkill in Political Judgments’, doi:10.2139/ssrn.1427862Search in Google Scholar

Bolukbasi, T., Chang, K.W., Zou, J.Y., Saligrama, V., and Kalai, A.T. (2016). ‘Man Is to Computer Programmer as Woman Is to Homemaker? Debiasing Word Embeddings’, in D.D. Lee, M. Sugiyama, U.V. Luxburg, I. Guyon, and R. Garnett (eds.). Advances in Neural Information Processing Systems, pp. 4349–4357.Search in Google Scholar

Brennan, J. (2016). Against Democracy (New Jersey: Princeton University Press).10.1515/9781400888399Search in Google Scholar

Campolo, A., Sanfilippo, M., Whittaker, M., and Crawford, K. (2017). ‘AI Now 2017 Report’, AI Now.Search in Google Scholar

Cederman, L.E. (1996). ‘Rerunning History: Counterfactual Simulation in World Politics’, in P.E. Tetlock and A. Belkin (eds.). Counterfactual Thought Experiments in World Politics: Logical, Methodological, and Psychological Perspectives (Princeton, NJ: Princeton University Press), pp. 247–267.Search in Google Scholar

Conitzer, V., Sinnott-Armstrong, W., Schaich Borg, J., Deng, Y., and Kramer, M. (2017). ‘Moral Decision Making Frameworks for Artificial Intelligence’, Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence.10.1609/aaai.v31i1.11140Search in Google Scholar

Danziger, S., Levav, J., and Avnaim-Pesso, L. (2011). ‘Extraneous Factors in Judicial Decisions’, Proceedings of the National Academy of Sciences 108 (17): 6889–6892.10.1073/pnas.1018033108Search in Google Scholar

Davis, J. (2017). ‘Here Me Out: Let’s Elect an AI President’, Wired, https://www.wired.com/2017/05/hear-lets-elect-ai-president/ (accessed on August 24, 2019)Search in Google Scholar

Dawes, R.M. (1979). ‘The Robust Beauty of Improper Linear Models in Decision Making’, American Psychologist 34 (7): 571.10.1017/CBO9780511809477.029Search in Google Scholar

Dietvorst, B.J., Simmons, J.P., and Massey, C. (2015). ‘Algorithm Aversion: People Erroneously Avoid Algorithms After Seeing Them Err’, Journal of Experimental Psychology 144 (1): 114–126.10.1037/xge0000033Search in Google Scholar

Farnham, B. (1990). ‘Political Cognition and Decision-Making’, Political Psychology 11 (1): 83–111.10.2307/3791516Search in Google Scholar

Finucane, M.L. (2000). ‘The Affect Heuristic in Judgments of Risks and Benefits’, Journal of Behavioral Decision Making 13: 1–17.10.1002/(SICI)1099-0771(200001/03)13:1<1::AID-BDM333>3.0.CO;2-SSearch in Google Scholar

Floridi, L. (2016). ‘Should We Be Afraid of AI?’ Aeon, https://aeon.co/essays/true-ai-is-both-logically-possible-and-utterly-implausible (accessed on July 5, 2019).Search in Google Scholar

Gordon, C. and Arian, A. (2001). ‘Threat and Decision Making’, Journal of Conflict Resolution 45 (2): 196–215.10.1177/0022002701045002003Search in Google Scholar

Grove, W.M., Zald, D.H., Lebow, B.S., Snitz, B.E., and Nelson, C. (2000). ‘Clinical Versus Mechanical Prediction: A Meta-Analysis’, Psychological Assessment 12 (1): 19–30.10.1037/1040-3590.12.1.19Search in Google Scholar

Guthrie Weissman, C. (2017). ‘The Huge Difference Between Business And Political Strategies’ Fast Company, https://www.fastcompany.com/3067071/the-huge-difference-between-business-and-political-strategies (accessed on August 24, 2019).Search in Google Scholar

HC Deb 10 27 June 2007, vol. 462, cols 334-335.10.1080/01443610701269358Search in Google Scholar

Kahneman, D. (2011). Thinking Fast and Slow (New York, NY: Farrar, Strauss and Giroux).Search in Google Scholar

Kahneman, D. and Klein, G. (2009). ‘Conditions for Intuitive Expertise: A Failure to Disagree’, American Psychologist 64 (6): 515–526.10.1037/a0016755Search in Google Scholar

Kass, L. (1997). ‘The Wisdom of Repugnance’, The New Republic 216 (22): 17–26.Search in Google Scholar

Keeney, R.L. (1973). ‘A Decision Analysis with Multiple Objectives: the Mexico Airport’, The Bell Journal of Economics and Management Science 4 (1): 101–117.10.2307/3003141Search in Google Scholar

Keeney, R.L. (1988). ‘Value-driven Expert Systems for Decision Support’, Decision Support Systems 4 (4): 405–412.10.1007/978-3-642-86679-1_9Search in Google Scholar

Keeney, R.L. (1996). Value-Focused Thinking (Cambridge, MA: Harvard University Press).10.2307/j.ctv322v4g7Search in Google Scholar

Kim, Y. and Lee, M. (2017). ‘Humans are Still Better than AI at Starcraft – for Now’, MIT Technology Review, https://www.technologyreview.com/s/609242/humans-are-still-better-than-ai-at-starcraftfor-now/ (accessed on July 5, 2019)Search in Google Scholar

Klein, G. (1999). Sources of Power: How People Make Decisions (Cambridge, MA: MIT Press).Search in Google Scholar

Kuran, T. and Sunstein, C.R. (1999). ‘Availability Cascades and Risk Regulation’, Stanford Law Review 51: 683–768.10.2307/1229439Search in Google Scholar

Lewis, D.K. (1973). Counterfactuals (Oxford: Blackwell).Search in Google Scholar

Linares Lejarraga, S. (2017). ‘Democracy, Epistemic Values, and Equality: A New Model of Epistemic Participatory Democracy’, Ethics & Politics 2: 247–283.Search in Google Scholar

Lovett, A. and Forbus, K. (2017). ‘Modeling Visual Problem Solving as Analogical Reasoning’, Psychological Review 124 (1): 60.10.1037/rev0000039Search in Google Scholar

Meehl, P. (1954). Clinical versus Statistical Prediction: A Theoretical Analysis and a Review of the Evidence (Minneapolis: University of Minnesota Press).10.1037/11281-000Search in Google Scholar

Meehl, P. (1986). ‘Causes and Effects of My Disturbing Little Book’, Journal of Personality Assessment 50 (3): 370–375.10.1207/s15327752jpa5003_6Search in Google Scholar

Norman, D. (2014). Things that Make Us Smart: Defending Human Attributes in the Age of the Machine (New York: Diversion Books).Search in Google Scholar

O’Neill, C. (2016). Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (New York: Crown Publishing Group).Search in Google Scholar

Palmer, C.L. and Peterson, R.D. (2016). ‘Halo Effects and the Attractiveness Premium in Perceptions of Political Expertise’, American Politics Research 44 (2): 353–382.10.1177/1532673X15600517Search in Google Scholar

Ross, W.D. (1930). The Right and the Good, 2002 edition (Oxford: Oxford University Press).Search in Google Scholar

Russan, M.-A. (2016). ‘CIA Using Deep Learning Neural Networks to Predict Social Unrest Five Days before It Happens’, International Business Times, http://www.ibtimes.co.uk/cia-using-deep-learning-neural-networks-predict-social-unrest-five-days-before-it-happens-1585115 (accessed on July 5, 2019).Search in Google Scholar

Shu, L.L., Tsay, C.-J., and Bazerman, M. (2012). ‘Cognitive, Affective, and Special-Interest Barriers to Wise Policy Making’, in J. Kreuger (ed.). Social Judgment and Decision Making (Oxford: Psychology Press), pp. 243–261.Search in Google Scholar

Slovic, P. (1999). ‘Trust, Emotion, Sex, Politics, and Science: Surveying the Risk-assessment Battlefield’, Risk Analysis 19 (4): 689–701.10.1111/j.1539-6924.1999.tb00439.xSearch in Google Scholar

Sunstein, C.R. (2002). ‘Probability Neglect: Emotions, Worst Cases, and Law’, The Yale Law Journal 112 (1): 61–107.10.2307/1562234Search in Google Scholar

Tetlock, P.E. (2017). Expert Political Judgment: How Good Is It? How Can We Know? (Princeton: Princeton University Press).10.1515/9781400888818Search in Google Scholar

Tetlock, P.E. and Belkin, A. eds. (1996). Counterfactual Thought Experiments in World Politics: Logical, Methodological, and Psychological Perspectives (New Jersey: Princeton University Press).10.1515/9780691215075Search in Google Scholar

Thorndike, E.L. (1920). ‘A Constant Error in Psychological Ratings’, Journal of Applied Psychology 4 (1): 25–29.10.1037/h0071663Search in Google Scholar

Vamplew, P., Dazeley, R., Foale, C., Firmin, S., and Mummery, J. (2018). ‘Human-aligned Artificial Intelligence Is a Multiobjective Problem’, Ethics and Information Technology 20 (1): 27–40.10.1007/s10676-017-9440-6Search in Google Scholar

Wainer, H. and Zwerling, H.L. (2006). ‘Evidence That Smaller Schools Do Not Improve Student Achievement’, Phi Delta Kappan 88: 300–303.10.1177/003172170608800411Search in Google Scholar

William Lucker, G., Beane, W.E., and Helmreich, R.L. (1981). ‘The Strength of the Halo Effect in Physical Attractiveness Research’, The Journal of Psychology 107 (1): 69–75.10.1080/00223980.1981.9915206Search in Google Scholar

Yu, K.-H., Zhang, C., Berry, G.J., Altman, R.B., Ré, C., Rubin, D.L., and Snyder, M. (2016). ‘Predicting Non-small Cell Lung Cancer Prognosis by Fully Automated Microscopic Pathology Image Features’, Nature Communications 7: 12474. doi:10.1038/ncomms12474.Search in Google Scholar

Zemel, R., Wu, Y., Swersky, K., Pitassi, T., and Dwork, C. (2013). ‘Learning Fair Representations’, Proceedings of 30th International Conference on Machine Learning 28 (3): 325–333.Search in Google Scholar

Published Online: 2019-11-07
Published in Print: 2019-11-18

© 2019 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 27.4.2024 from https://www.degruyter.com/document/doi/10.1515/mopp-2019-0004/html
Scroll to top button