1 AI as a post-colonial thread

As the impact of artificial intelligence (AI) reaches farther than the geographies and people to whom most research in the field is restricted, it is paramount to investigate its most important ethical and political implications broader. Technology embodies certain values and shapes the way we perceive the world and each other (Frishmann and Sellinger 2018; Friedman and Hendry 2019; van Wynsberghe 2013). AI in particular is likely to reproduce a certain worldview because it explicitly imitates certain kinds of behavior which are based on a value-laden concept: “intelligence”. The basis of this concept originates from dominant societies of the Global North, reflected in the usage, aims, and design of the currently developed AI technologies. The ability to determine this course is a privilege for those in power. For example, even though the African continent, among other parts of the Global South,Footnote 1 plays a crucial role in the AI production process, the role of many of its people is mostly restricted to its early phases, including exploitative practices such as data extraction and clickwork for the training of AI-powered systems. Moreover, these systems are not primarily designed to satisfy the needs of those who withstand long, taxing hours in the process of production (Adams 2022; Couldry and Meijas 2018). These and other issues explain the growing interest in the application of decolonial theory in the context of AI (Adams 2021).

Characteristically, AI politics of the Global North is only facing consequences that seem necessary for the protection of some of its own citizens. An illustrative example was delivered by the EU AI Act. The act is supposed to protect against the most pressing dangers posed by AI technologies, assigning three categories of risk: unacceptable, high, and no risk, to a broad variety of applications. Even though there are some efforts to protect vulnerable groups due to special regulations, the words “racism,” “colonialism,” “exploitation,” “Global South,” or any other continent than Europe do not appear in 108 pages of the EU AI Act (European Commission 2021).

Still, the growing importance of AI in the global economy makes the use of such technologies unavoidable. The foreign imposition of norms and values implicit in design has therefore become a pressing issue (Hasselbalch 2022; van Wynsberghe 2021). Decoloniality seeks to disrupt the persisting power structures originating from colonialism and aims to replace them with plural and diversified conceptions of values and knowledge. Decolonial thinking can be described as a theoretical and political movement aimed against coloniality, a term that relates to power structures that stem from historical colonialism and uses “race as an organizing principle that not only hierarchized human beings according to racial ontological densities but also sustains asymmetrical global power relations and a singular Euro-North American-centric epistemology that claims to be universal, disembodied, truthful, secular, and scientific.” (Ndlovu-Gatsheni 2015). Coloniality by now may have changed in important respects even though continuities can be traced until today, for example, in particular societal structures and/or institutions. To refer to this continuity without conflating the present and the past we will describe the relevant features as “post-colonial” as they come after historical coloniality (Young 2016). In recent years, attention has increasingly been paid to the relation between AI and coloniality, and to how to decolonize the technology and its discourse (Chaka 2022; Lambrechts et al. 2022; Madianou 2021; Ricaurte 2022), by, for example, putting it in dialogue or contrasting it with knowledge and value systems stemming from the Global South (Bonaldo and Barbosa Pereira 2023; Shedlock and Hudson 2022) or by integrating decolonial strategies for AI in higher education (Zemblyas 2023).

In this paper, we aim to contribute to this growing AI discourse, by critically examining the conception of intelligence underlying AI. Indeed, the concept “intelligence” has a long history in creating and preserving (post-)coloniality. An analysis of AI’s underlying conception of intelligence is therefore unavoidable (Adams 2021). Recent advances regarding this specific issue have been made by Stephen Cave (2020) who outlined how the obsession with intelligence in general serves the reproduction of prevailing power structures and poses a threat to those already discriminated against. Additionally, Rachel Adams (2021) discusses the possibility of decolonial AI in general where intelligence is depicted as one of many “dividing practices” that serve to keep apart those in power from a constructed other. In line with this reasoning, this paper aims to make the claim that dominant conceptions of intelligence used in AI are biased by normative assumptions that originate from the Global North and cannot just be uncritically applied elsewhere. Moreover, it is argued that the problem is not with the use of a concept of intelligence in general but with universalizing a singular one. Instead of operating on a globalized matrix of power (Adams 2021; Grosfoguel 2007), AI research should make use of a variety of concepts, being sensitive to variable contexts of application. As it is not possible to revisit the whole field here, we will focus on the Turing Test (TT), which provides a good starting point and has been of great importance for AI research for a significant amount of time, still remaining an important point of orientation for many AI researchers (Natal 2021; Neufeld and Finnestad 2020; Vorobiev and Samsonovich 2018).Footnote 2 Whatever AI is used for, and regardless of how far a specific technology has come in the pursuit of achieving human-level intelligence or above [which also is referred to as “strong AI” (Searle 1980)], there will always be a dependence on an underlying conception of intelligence.

To evaluate the extent to which the idea of AI is based on a specific conception of intelligence originating from the Global North and to identify associated problems, in Sect. 2 of this paper, we will provide insight into potential harms arising from false beliefs about intelligence or the use of scientifically flawed concepts in this context, by shortly examining the history of IQ testing, focusing on its discriminatory misuse for classist, sexist, and racist purposes. We define intelligence ontologically and underline its constructed and culturally variable character to develop a better understanding of the described issues. We turn to AI in Sect. 3, focusing on the TT and its underlying conceptions of intelligence. We argue that both the test itself and the way it is exercised in practice promote a conception of intelligence originating from the Global North. This conception needs to be assessed critically because of the variability of contexts in which AI technologies are and will be used. In Sect. 4, we highlight how unequal power relations in AI research are a real threat rather than just philosophical sophistry, in light of the history of IQ testing and the TT’s practical biases. In the last section, we examine the limits of the here-developed account of intelligence as related to colonial biases within AI and the TT.

2 The myth of ‘intelligence’ and the history of IQ testing

The concept of intelligence, throughout history to our current day, is often used to justify persisting power structures in terms of ability or qualification. It is mostly seen as a singular concept that is equivalent to a certain IQ score. This is based on problematic assumptions (Daley and Onwuegbuzie 2020). To highlight the dangers that accompany a narrow understanding of intelligence and to provide a contrasting view that is more in line with current research about intelligence, we will firstly present some findings about implicit theories of intelligence and their cultural and contextual dependencies. We will then proceed to critically revisit the historical and current use of IQ testing.

2.1 Implicit theories of intelligence and cultural differences

Intelligence, we contend, should be regarded as a social and scientific construct (Alfano et al. 2017; Sternberg 1985). It is a thick concept, meaning that it is at the same time descriptive and evaluative (Arun 2020; Cave 2020). It persists in a variety of cognitive capacities that are judged to be valuable when expressed in a certain context. Individuals’ cognitive ability in this respect can, though not in all cases, be measured by descriptive tests. In assessing such tests, it is useful to differentiate between explicit and implicit theories of intelligence. While tests are based on explicit theories to develop a scientific and mostly descriptive concept of intelligence, they cannot avoid being influenced by implicit assumptions. Implicit theories of intelligence, according to Sternberg, are systematic theories of intelligence and related concepts (such as wisdom and creativity), that, consciously or otherwise, underlie people’s judgments of whether a person (including themselves) is intelligent or not (Sternberg 1985, 2019). These theories are culturally variable and normative constructs that are not objects of scientific invention; rather, they are shaped by social conventions and include prototypical beliefs about how an intelligent person should be (Sternberg 1985).

Culture, broadly defined, can be regarded as “[…] integrated patterns of learned beliefs and behaviors that are shared among groups and include thoughts, communication styles, ways of interacting, views of roles and relationships, values, practices, and customs” (Vaughn 2019, p. 2).Footnote 3 It is widely accepted in psychological research that the mental processes, underlying intelligence and the mental representations upon which those processes act, transcend culture (Sternberg 2004). Still, implicit conceptions of intelligence vary in several fundamental respects, reflecting longstanding cultural traditions (Niu 2020; Sternberg 2004). Intelligence, according to Greenfield, is defined in terms of adaption to a cultural environment and can be seen as an expression of cultural ideals (Daley and Onwuegbuzie 2020; Greenfield 1998). Outside its cultural context, intelligence cannot be “[…] fully or even meaningfully understood” (Sternberg 2004, p. 325). Ignoring this means failing “[…] to do justice to the range of skills and knowledge that may constitute intelligence broadly defined and risks drawing false and hasty generalizations” (Sternberg 2004, p. 325).

Based on the literature, there are a variety of implicit normative conceptions of intelligence, including but not limited to:

  1. (1)

    The crucial capacities that are thought to be an expression of intelligence: Implicit theories of intelligence focus on a variety of dimensions, ranging from abstract reasoning, typical for the history of ‘Western Philosophy,’ to application-centered approaches that define intelligence in terms of smart actions, especially found in India. Further conceptions include social, moral, and creative capacities (Greenfield 1997; Nisbett and Masuda 2003; Niu 2020).

  2. (2)

    How important certain capacities are thought to be for intelligence: Even if the same elements may appear among different conceptions of intelligence, they can be prioritized in very different ways, placing, for example, more emphasis on mathematical problem-solving abilities than emotional self-control (Niu 2020).

  3. (3)

    Beliefs about how intelligence is expressed: It matters not only which capacities are included in the set of intelligent-relevant abilities; there are also different ideas about how such abilities might be expressed. While classical IQ tests (common in North America), for example, place great emphasis on speed in solving tasks, in Uganda, taking time to think is considered a sign of intelligence (Alfano et al. 2017; Greenfield 1997; Nisbett and Masuda 2003; Niu 2020; Sternberg 2004).

Implicit theories of intelligence play an important role in our perceptions of intelligence and to which extent our behavior will be regarded as intelligent or not (Alfano et al. 2017). Accordingly, tests of intelligence have to be regarded as items of the culture they originate from. Thus, in diverse contexts, it is often not possible to conduct fully accurate tests. Implicit theories of intelligence are always restricted to a certain cultural group and societal context, which means they change over time and differ between cultures (Greenfield 1997; Sternberg 1985).

Whether a test is applicable or not does not depend solely on its underlying conception of intelligence but also on a variety of additional contextual variables, such as how the tasks of the tests are presented, which kinds of objects they refer to, and under what circumstances they are conducted (e.g., including time constraints or not). Familiarity with the surrounding conditions can often support a better result. Consequently, it is not possible to simply test ‘intelligence’, as there is no single measurable phenomenon that the term refers to. This means that a variety of tests are needed to do justice to the multiple conceptions of intelligence. These tests, however, cannot be easily compared with each other, as they may include the measurement of different capacities that cannot readily be brought into relation with each other. Similarly, a single test can only be used to assess a fixed set of capacities that may be of varying importance to the specific living conditions of people of multiple cultural backgrounds and their conceptions of intelligence (Greenfield 1997).

Thus, to label intelligence a social construct means to point to the fact that the term does not represent any kind of objectively measurable quantity that has a cultural invariable correlate in the world. Instead conceptions of intelligence always assemble a variety of cognitive capacities that are judged to be valuable in a certain social context on which they ultimately depend. Whereas there might be objectively observable neurological patterns that support the exercise of a certain capacity (Sternberg 2004), it is only possible to describe those patterns as a sign of intelligence by reference to this social context.

2.2 The history of IQ testing

Throughout the history of assessing intelligence through IQ testing the previously discussed issues have, to date, been mainly disregarded. The debate on whether and how intelligence could be tested began at the end of the nineteenth century. Alfred Binet, together with Theodore Simon, was the first to provide an adequate measurement of what was thought to count as intelligence at that time. Intelligence, according to Binet’s test, persisted in “[…] a variety of ‘higher’ psychological faculties” (Mackintosh 2011, p. 5), the most important of which are “[…] attention, memory, imagination, common sense, judgment, abstraction” (Mackintosh 2011, p. 5; Binet and Simon 1904). Binet aimed to develop a measure to distinguish slow from quicker learners in school, “[…] in order to determine which children were so behind that they should be given special education” (Cave 2020, p. 31 as cited by Adams 2021, p. 188; Binet and Simon 1904). To do so, he introduced the idea of a mental age, where the results of a specific child in Binet’s tests were compared to the average results of children of the same age. Binet’s identified faculties of intelligence relate closely to academic features. Accordingly, the different tasks of the tests were most indicative of abilities that would improve the testee’s success in the French school system at that time (Alfano et al. 2017; Binet and Simon 1904; Mackintosh 2011; Sternberg 2019). Binet’s core assumptions were widely accepted and celebrated within the psychological community and still are of importance today. William Stern revised the former formation of scores introducing 100 as the basic score, based on a division of mental age by chronological age (Stern 1912). This was the beginning of actual IQ testing. Based on the findings of Stern and Binet, Lewis Terman developed the Stanford-Binet intelligence scale, which was the standard IQ test in the twentieth century and, in a modernized version, still plays a considerable role today (Terman 1916; Fletcher and Hattie 2011). Another major revision after the introduction of IQ scores was made by David Wechsler who replaced mental age with comparisons of scores within larger populations since intelligence does not in all cases grow proportionally to age and completely stops doing so in adulthood. Wechsler’s approach is mostly regarded to be different from Binet’s concerning the way the scores are calculated. The tests he uses, however, focus on similar abilities to the ones Binet already measured (e.g., attention, judgement, abstraction). By now the Wechsler Adult Intelligence Scales have become the standard IQ test (Alfano et al. 2017; Binet and Simon 1904; Mackintosh 2011; Sternberg 2019; Wechsler 1958).

Modern IQ tests often claim to measure intelligence in general (Alfano et al. 2017). This idea goes back to Charles Spearman’s introduction of the general factor g, which is supposed to “[…] determine […] Intelligence [sic] in a definite and objective manner” (Spearman 1904, p. 206) and thus considered to be key to understanding intelligence (Spearman 1904; Sternberg 2019). Even though the existence of such a factor is questionable, g is still present today in form of the so-called “positive manifold,” which is the “[…] all-positive pattern of correlations that is observed when several intelligence tests, of varying format, are administered to a large sample of subjects” (Alfano et al. 2017, p. 477). Complex interactions between different cognitive processes are believed to result in intelligence in a general sense. The intelligence of a particular person could therefore be rated solely by estimating the value of their g. Accordingly, g is regarded as the single most effective factor to estimate how successful a person will be in higher education or as an employee in certain jobs. This might serve as an explanation for why IQ tests are so widely used today in job applications and as a means of allocating educational resources. Hence, g is based on a construct—intelligence as a singular and ultimately measurable phenomenon (Daley and Onwuegbuzie 2020).

2.3 Preserving power structures by IQ testing

Soon after scientific methods for the measurement of intelligence were constructed, it transpired that, according to these tests, the most intelligent people in the world were wealthy white men of Middle and North European and North American origin. This result is neither surprising nor valid considering the underlying assumptions of the tests: Testing turned out to be primarily beneficial for those who have a lot in common with those who developed it (Alfano et al. 2017).

That intelligence provides a right to power is an old idea in the ‘western tradition’ of philosophy. As Plato (2006) argued for a philosopher king in terms of his possession of reason in the Republic, so did Aristotle (1905) in his suggested sociopolitical hierarchy laid out in his Politics.Footnote 4 According to the latter, a ‘slave by nature’ is one who has ‘lesser’ mental abilities, and therefore does good being ruled by a master (Cave 2020). As Cave points out, this line of reasoning had not been very important in the European Middle Ages but turned out to be useful to satisfy the “[…] need to provide a moral and intellectual justification for aggressive colonial expansion from Europe, with its associated conquest, pillage, and enslavement” (Cave 2020, p. 30). This was only possible in combination with two further assumptions—that the European way of reasoning is the only adequate instrument for grasping ‘the truth,’ and as such could be universalized (Grosfoguel 2007).

The most fundamental assumption in this respect is that there exists something like a purely objective epistemic location from which the subject could be decoupled (Grosfoguel 2007). Men from the Global North see themselves in possession of such an epistemic location, which allows them, as Grosfoguel (2007, p. 213) puts it, “[…] to represent [their] knowledge as the only one capable of achieving a universal consciousness, and to dismiss non-Western knowledge as particularistic and, thus, unable to achieve universality.” In this line of reasoning, it is possible to organize different kinds of thinking and knowing in terms of superiority and inferiority, likewise ranking people in terms of their abilities based on these criteria (Grosfoguel 2007). Modern IQ testing can be regarded as an expression of this more general problem of epistemic hierarchization.

The philosophy of race in the nineteenth and twentieth centuries frequently mentions the superiority of white people. A large amount of work was dedicated to giving this claim a pseudo-scientific basis. Intelligence testing can be regarded as the most successful of many attempts in this respect because it is based on a scientific framework that goes beyond obviously racist paradigms (Cave 2020). Different results in IQ tests among people of different races, classes, and genders served as an argument for these groups’ oppression. Stern, for example, was ambiguous about how to explain differences between individuals of varying socially important groups in the tests available at his time (Stern 1912). While pointing to the different living conditions of children whose families had less money, concerning, e.g., the time spent with adults or the quality of education they receive, he also considered the inheritability of lower degrees of intelligence as an explanation for these conditions (Stern 1912).Footnote 5 Some years later, Terman lined out the possibility to demonstrate the assumed inferiority of women, Black, and poor people by using IQ tests to justify that it was appropriate that white educated men formed the societal elite (Cave 2020; Terman 1916). It did not occur to most of the interpreters that the differences in intelligence testing may originate from its inert biases and, if it did, a lot of effort was spent to hide this fact. Herrnstein and Murray dedicate their whole book The Bell Curve (1994) to defend the thesis that a hierarchy of cognitive ability distribution can be made up, ranging from East-Asians on the upper rank to Blacks and Latinos on the bottom. Intelligence, according to their belief, is thought to be mostly genetically determined, which naturalizes racist beliefs about certain dispositions and strengthens the present power structure. Such scientifically unverifiable claims are still present today (Alfano et al. 2017).

IQ testing today plays a considerable role and is used all over the world to assess people’s suitability to access higher education and jobs. Even though other tests have been introduced to exclude wrong interference by falsely establishing that one mental ability comes forth by another and the term ‘intelligence’ is used more carefully by now,Footnote 6 modernized versions of well-established IQ tests such as the Stanford-Binet intelligence test or the Wechsler Adult Intelligence Scale are still in use. Consequently, the aforementioned restrictions and biases remain prevalent. Moreover, these tests continue to elevate the importance of those capacities necessary for academic success and leave little to no space for different ideas about intelligence (Fletcher and Hattie 2011). Accordingly, intelligence is constructed from a powerful position and granted a societal role that is useful for protecting the holder of this position from any intrusion from a ‘lower’ sphere (Daley and Onwuegbuzie 2020).

Equating intelligence with IQ assumes that it is a purely descriptive phenomenon that manifests in different degrees in different persons and can be measured by using objective criteria cumulated in a single test that could be applied to any human without regard to a broader context (Mackintosh 2011). This is based on the belief that there is only one possible definition of intelligence, namely, the set of abilities gathered in IQ testing. In accordance with the findings on implicit conceptions of intelligence as presented above, we may claim that most of these assumptions are false, and intelligence can no longer serve as a justification for persisting power structures.

Regarding AI, we must be aware of the fact that certain technologies will be unavoidably used on a global scale and that these applications will propagate or even impose a certain underlying conception of intelligence. In light of the history of IQ testing and the current scientific consensus on intelligence, it is clear that this should be avoided.

3 The Turing Test and its cultural biases

In this paper, AI is regarded as those technological systems that bear the capacity to resemble any kind of intelligence (Coeckelbergh 2020), depending on computing power and different algorithmic processes. These systems can be used “[…] for a given set of human-defined objectives, generate outputs such as content, predictions, recommendations, or decisions influencing the environments they interact with” (European Commission 2021, p. 39). We focus on those applications for which a TT or a restricted TT could be used as an instance of evaluation. The field of AI is commonly understood broadly as the replication, mainly, of human-like intelligence into a machine (Coeckelbergh 2020). To that end, there are many methods through which this can be achieved: symbolic reasoning, machine learning, deep learning, neural networks, and so on (Boden 2016). Among the many ways of testing AI, Alan Turing’s proposal of what he calls “the Imitation Game” has proven itself to be very influential. Although contested since its inception, it is still of considerable importance today (Danziger 2022; Myong et al. 2023; Natal 2021; Neufeld and Finnestad 2020; Noever and Ciolino 2022; Vorobiev and Samsonovich 2018). As it is beyond the scope of this paper to fully analyze the whole range of conceptions of intelligence used in AI and to uncover their cultural biases, we focus on the TT because it is positioned at the beginning of AI’s development, and it not only motivated but also influenced the field to a large extent. Accordingly, it may be representative of AI research in general in terms of its practices and biases. Although the Imitation Game does not explicitly promote any conception of intelligence, which makes it attractive for deployment in a multicultural context, we must be aware of its implicit assumptions.

3.1 The Imitation game

The Imitation Game is an empirical test to assess whether intelligence has been successfully implemented in a computer (Neapolitan and Jiang 2018). The test is presented in a variety of different forms, but we will focus on the standard interpretation, which is based on Turing’s (1950) paper “Computing Machinery and Intelligence” (Natal 2021; Oppy and Dowe 2021). The test consists of three participants, a human interrogator (P1), a second human (P2), and a computer, each in separate rooms, who take part in a written conversation (e.g., a chat). This textual medium of conversation excludes any physical factors, such as voice or body shape. It is intended to make it unnecessary to design a perfect human-looking robot. This makes certain modes of physical or visual interaction impossible. P1 must find out which of the entities with whom theyFootnote 7 are communicating is the human and which is the machine by asking questions of all kinds to both (Neopolitan and Jiang 2018; Turing 1950). If P1 cannot guess accurately more than half of the time, the computer passes the test and is thought to be intelligent (Natal 2021; Neapolitan and Jiang 2018; Oppy and Dowe 2021; Turing 1950). Turing’s test does not measure intelligence itself but only how successfully it can be resembled.AI technologies, in this context, are being assessed by their credibility to human users, making the latter central in the process of testing (Natal 2021).

The TT has been considered an effective test for AI because it leaves open a variety of possibilities for testing a certain technology and its ability to imitate a human being. Even though the test can be modified, for example, restricted in time or solving a certain task (which is more useful for testing, often specialized, modern applications of AI), the original idea is that it is not sufficient to fool any ordinary observer. Additionally, the computer must survive a harsh interrogation in which P1 is aware of the fact that one of the participating entities they are talking to is a machine in a repeated number of trials (Oppy and Dowe 2021). In this regard, TT, allegedly, sets high standards for measuring a variety of capacities.

The TT is intended to assess a broad range of possible conceptions of intelligence. This could be a positive thing, given the large spectrum of possible beliefs about it; however, the test is restricted by at least two presumptions about intelligence. Firstly, intelligence is primarily and fully expressed in terms of a written conversation. Written language therefore becomes the primary medium of expressing intelligence. Secondly, and in line with implicit theories about it, evaluating whether an entity is intelligent or not, is, in Turing’s words, “[…] determined as much by our own state of mind and training as by the properties of the object under consideration” (Proudfoot and Copeland 2012; Turing 1950). Whether a machine passes the test or not is, to a large extent, dependent on the interrogator.

3.2 Potential and actual biases of the Turing Test

In our assessment of the TT, we discuss three main points. Firstly, the TT’s lingual medium of written conversation excludes alternative ways of expressing intelligence. Secondly, it necessarily involves deception, which is likely to contradict conceptions of implicit theories of intelligence that focus on moral integrity. These two claims are conceptual and cannot be solved without fundamentally altering TT. The third claim, TT’s casting problem, refers to practices in the field of AI in general and considers the crucial importance of the interrogator. With regard to the large range of different implicit theories of intelligence, who is participating in the test makes an important difference.

3.2.1 The restriction on language

The TT prioritizes language as the primary means to express intelligence (Neufeld and Finnestad 2020). While competent expression of language might be a sufficient condition for intelligence,Footnote 8 lingual capacities are by no means a necessary condition for it. Among all possible (and empirically discovered) conceptions of intelligence, only some can be displayed in written communication. Especially in collectivistic societies, mostly to be localized in the Global South, much greater emphasis is put on emotional, social, or moral intelligence—none of which can be fully expressed in a purely conversational context (Niu 2020). The same counts for various kinds of creativity.

This exclusion reflects partially the implicit theories of intelligence dominant in the Global North. These mostly include (1) a practical problem-solving ability, namely, the ability to “[…] [reason] logically, [identify] connections among ideas, [seeing] all aspects of a problem” (Sternberg 1985, p. 625), (2) a verbal ability “[Speaking] clearly and articulately, [being] verbally fluent, [conversing] well” (Sternberg 1985, p. 625), and (3) social competence “[…] [accepting] others for what they are, [admitting] mistakes, [displaying] interest in the world at large” (Sternberg 1985, p. 625). While (3) is restricted, TT is adequate to measure the other two. It may seem obvious for (2), as a verbal ability many times can adequately be expressed in a verbal test. Importantly, for (1), the ability to think in an abstract way, form categories based on nameable rules, and avoid self-contradiction gave fuel to the illegitimate universalization of European reason in historical colonialism. Consequently, the historical assumptions laid out in the previous section are reflected in TT. It is paradigmatic that the definition of AI given in the EU AI Act focuses primarily on these abilities as well (European Commission 2021). As we have shown, these capacities do not necessarily have the same importance in other cultures (Cave 2020; Grosfoguel 2007). This problem cannot be fixed without altering the TT fundamentally. It would demand overcoming the textual medium and would introduce physical interaction, which creates a variety of different problems that are not related to intelligence alone. The TT would not be the same test anymore, as the machine’s success would also depend on its physical appearance.Footnote 9 Thus, the TT in its exclusion of physical cues renders some expressions of social intelligence unnecessary.

3.2.2 The Turing Test and deception

Another conceptual claim should be made about the role of deception in TT. In most cultures, especially in Asia and Africa, the morality or moral integrity of an individual is considered to be crucially indicative of a person’s intelligence (Niu 2020). Deception is mostly seen as immoral. To pass the TT, the machine has to deceive P1, as this is the ultimate condition for a TT to be passed. In the context of a game, deception might belong to the rules upon which the participants agree, and cannot be considered immoral. But at the same time, it renders the ability to deceive any human about the machine’s true nature as somehow valuable for intelligence. Leaving the game context to the real world, deceiving becomes morally sensitive again. If moral integrity is part of any conception of intelligence, it is not possible for it to include any kind of immoral deception in its definition of intelligence to avoid self-contradiction. Testing AI in the Global North will likely not take this into account as it is not included in dominant ideas about intelligence. Thus, in its exclusion of ‘honesty,’ the TT renders the capacity for perceiving honest vs. deceitful as unnecessary criteria for intelligence.

3.2.3 The casting problem

The casting problem, contrary to other issues, does not arise from the test itself but from related practices in AI research. The Imitation Game is played by three individuals, each of whom brings their own implicit assumptions about intelligence to the table. We will analyze the role of each of them, in turn, starting with the interrogator.

The task of the interrogator (P1) is to find out which of the other two participants is a computer and which is a human. Their judgment is based on how appropriate they found the answers of the other participants to the particular conversational context. Judged appropriateness depends on whether the participating entities display a ‘normal’ kind of intelligent behavior, according to P1’s point of view. Respectively, whether a machine will pass the TT or not will partially depend on P1’s implicit assumptions about intelligence. As these assumptions vary across cultures and individuals, it is of crucial importance who P1 is and on what conceptions they are operating. We can assume that it is more likely for P1 to deem the entity most similar to themselves as human. This means they have to be able to express those kinds of behavior judged to be most important for ‘human-level’ intelligence according to P1.

The second human (P2) in the TT is setting another kind of standard of human–human interaction, as their intelligence is also being tested and has to be outperformed by the machine. Their success in proving not to be the computer or AI depends on their credibility to P1. The more familiar P2 is with the context, which is ultimately shaped by the interrogator, the more likely they will be able to meet their expectations. In the case of a more similar P2, it will be easy for P1 to determine which entity is the machine, as it has to be a perfect replicate of themselves or P2. We then have a homogeneous group of two humans and a machine that all operate on the same conception of intelligence. If P2 has a very different cultural background that is reflected in their behavior, the task for P1 becomes more difficult. While we can expect the computer or AI to be programmed to match P1 as well as possible, some of P2’s reactions may seem peculiar and strange to them. In this scenario, it is likely that P2 is being outperformed by the AI, not because it is the result of ingenious programming or a portrayal of ‘intelligence,’ but because of a bias underlying P1’s judgment. The TT is therefore no longer about the AI’s capacities, but also about P1’s knowledge regarding intercultural differences.

Finally, the machine is situated between P1 and P2. As mentioned, the best strategy for it to succeed would be to match P1’s expectations as much as possible. AI that is developed to pass the TT therefore will fully reproduce the implicit theories of intelligence that are likely to be found in the restricted group of possible interrogators.

The crucial dependency on P1 and their relation to P2 for a passed TT can be referred to as the TT’s casting problem. The casting problem is a practical problem rather than a conceptual one, as it results from a homogeneous group of participants in TT without further diversified rounds while testing the same technology. With the bulk of AI research localized in the Global North, it is likely that most interrogators participating in actually performed TTs have a similar origin. The annual Loebner prize illustrates this point. The prize is a public contest in which computer programs are tested by a variety of judges playing TT each year (Neufeld and Finnestadt 2020; Oppy and Dowe 2021). If we look through the publicly available list of its judges, they form a predominantly homogeneous group, including very few Black, Indigenous, People of Color (BIPoC), or women, all of whom are living in anglophone countries and mostly pursuing academic careers (Society for the Study of Artificial Intelligence and Simulation of Behavior, AISB 2022). This reflects the reality of AI research well and restricts the conception of intelligence tested in the TT to a fraction of the many ways intelligence can be perceived (Adams 2022; Kalluri 2020). In the remainder of this paper we will delve deeper in the casting problem because it is illuminating regarding persisting problems in the field of AI in a more general sense.

4 Problematizing restricted intelligence

The TT’s casting problem brings us back to the general distribution of power in the field of AI. As Rachel Adams shows in a recent report on AI in Africa, the power relations in the field are far from being equally distributed. Instead, they mostly follow in the footsteps of historical colonialism and are likely to reproduce themselves through the introduction of new technologies (Adams 2022). The TT’s casting problem provides a salient example of the different dimensions of this possible reproduction.

As we have seen, in the TT everything is focused on the interrogator, whom we can imagine as a placeholder for any potential customer in a certain target group for which the technology being tested is developed (Natal 2021). To match these customers, the technology is built to pass a TT in a certain context of application and would do best to reflect customers’ implicit assumptions about intelligence. If the target group is homogeneous and narrowly restricted to a privileged group of humans, so is the necessity for implementing a variety of different conceptions of intelligence into the machine, as P1 will always belong to this homogeneous group. The resulting technology will then automatically reproduce a particular conception of intelligence. If the target group is more diverse, P1 should be played by very different people, and the majority of TTs will only be passed if the technology is not too restricted, meaning, if the TT has been trained and tested by P1s who have a variety of cultural or geographical backgrounds. Whose interests are being considered in defining the target group, however, is influenced by persisting power structures in the field of AI (Whittaker et al. 2019).

That the center of power in this context lies mostly with big tech companies originating from the Global North is not without consequences. On the one hand, the usefulness of the developed technologies will be restricted to a privileged group for which they were originally designed. This means that they are not fully applicable outside their original context, being insensitive to differences and potential threads affecting stakeholders who do not belong to their sphere of interest (Adams 2022; Arun 2020). Regarding the strong involvement of certain communities within the Global South in the production process of AI, as mentioned in the introduction, this alone might be unjust, as it is transferring most of the possible societal gains from such technologies away from the people who might have worked the hardest for achieving them (Adams 2022). A certain technology, for example, will be not so easy to handle by an unconsidered user if its underlying conception of intelligence is not shared by them and forces the user to adapt to it to be effective (Arun 2020).

On the other hand, technologies often spread all over the world and necessarily have to be used even by those who were not included in the designer’s thoughts (Whittaker et al. 2019). Today we must see “[…] AI as a general-purpose technology that affects all levels of society and the economy” (Adams 2022, p. 9), often leading to a pronounced dependency on it. As Frischmann and Selinger (2018) point out, the use of technology can have severe impacts on how people perceive the world, each other, and what they value. AI is likely to transform the way people think, as its embedded values can be regarded as part of the user manual. To make use of a certain technology the users have to become ‘computable,’ meaning that they have to accept the technology’s underlying assumptions and adjust to them; a process that in the end will likely shape whole societies (Frischmann and Selinger 2018).

In the case of AI, machines are likely to introduce a new standard of intelligence against which individuals are going to be evaluated, thereby reproducing the same structures inherent to the history of IQ testing. While particular individuals will see themselves as unable to hold up to standards that were not of any relevance to their social environment previously, they will suddenly become inferiorized and find themselves in the position of the TT’s second human. They will be outperformed by a machine that is designed to be able to replace them, as it better satisfies the needs of the powerful, is regarded as being more intelligent, and appears, according to the logic of TT, more human.

This dehumanizing process manifests itself in the application of the so-called reversed TTs (Nakamura 2019; Whittaker et al 2019). These are tests in which a human has to prove their identity to a machine, by, for example, filling in a CAPTCHA. While the latter as a quite simple cognitive task already excludes some groups of persons (e.g., it is not a matter of course that everyone can differentiate between a moped and a motorcycle, as both are motorized two-wheeled vehicles of a similar shape and there does not exist a differentiation in all languages), we can think of many other scenarios in which reversed TTs can easily turn into tools of segregation, for example, if machines are used in recruiting. In such recruiting interviews, TT’s interrogator and the machine swap places, making the success of the former ultimately dependent on the latter. As the programmers of the machine in the original TT would do well to let the program match the predicted expectations of P1, in reversed TT, the human would do well to match the programmed ‘expectations’ of the machine. To do so, they have to adjust to the underlying concept of intelligence. In light of the importance some applications have within a certain society, it is likely that such adjustments become a general development. Regarding the importance of power structures in AI, this may result in a loss of people’s native ideas about intelligence, accordingly changing their way of thinking and living (Arun 2020).

Decolonial thought demands abandoning imposed ways of thinking and perceiving the world. This is considered one “[…] essential condition for emancipation” (Adams 2021, p. 181). For the Global North, this means that AI researchers must be aware of the restrictions of their own perspective and of the need to include other approaches in their thinking to do justice to the factual pluralism when it comes to theories of intelligence. From an ethical point of view, there is no space for the translation of chauvinistic beliefs into AI technologies, as these are likely to become tools for the illegitimate stabilization of power structures.

5 Conclusion

This paper aimed to demonstrate, based on the TT, how AI is likely to promote or even impose one conception of intelligence that originates predominantly from the Global North. It does so not only due to biases implicit in the test itself but mostly through the way it is conducted with the consequence of reflecting the persisting power structures in the field in general. The reproduction of the discriminatory and oppressive structures related to the history of IQ testing and historical colonialism due to AI should be considered an unacceptable risk. This holds for both, according to the classification in the EU AI Act, but also in a mundane sense of the word “unacceptable,” concerning any serious attempt to decolonize AI. The problems arising from a one-sided approach to intelligence persist: (1) in an unjust distribution of the societal gains arising from the development of AI technologies (even though this does not reflect the division of labor in the process of developing such technologies) and (2) increasing dependency on technologies that are powered by AI that operates on a narrow conception of intelligence may lead to forced computability, meaning that people have to give up their native ideas about intelligence and replace them with the one propagated by AI. The latter may result in the vanishing of different human life forms, depriving people of their autonomy, and inhumanly measuring them to the standards of a machine. A first step to solving these problems would be to deconcentrate the power distribution in AI research, which could be done through diversification on several levels, including the scientific debates and programming, as well as testing practices like the TT. Additionally, policy documents from the Global North, like the EU AI Act, should not be limited to the protection of their own citizens but discuss, or at the very least reflect, post-colonial structures.

Despite being restricted to the TT, the above analysis can be regarded as reflecting the field of AI as a whole, because the most pressing issues seem to be rooted outside of the test itself. It would be interesting to conduct a similar analysis for other approaches to intelligence used in AI research. The TT seems to be among the most promising, as it leaves a lot of space to fill with a variety of different ideas about what should be measured. Trying for AI to cover a multiplicity of approaches to intelligence, the TT provides a good basis, although how to overcome its conceptual limits must be considered.

In another sense, Rachel Adams, among others, points to the fact that the whole idea of AI itself has its origin in the European compulsion to universalize reason, make it measurable, and put it into an object (Adams 2021; Birnhane 2020). This gives us reason to question whether there should be anything such as AI at all. Given its potential threats and biases, this prospect should be taken seriously. But as long as the expansion of AI is not simply stopped—which is very unlikely, though perhaps reasonable—it is imperative to continuously and critically reassess its most basic claims, to fend against its worst implications.