Collective intentionality—roughly, intending something as a group or as a group-member—can exist at various ‘levels’, ranging from simple dyads to very large groups. This scalar notion has influentially been employed in research and theorizing on the evolution of human culture, to understand how we gradually came to collaborate in increasingly large groups. Following Michael Tomasello, many researchers agree that an initial capacity for joint intentionality evolved into more general capacities for full blown collective intentionality that eventually enabled collaboration in cultural groups (e.g. Angus and Newton 2015; Boehm 2018; Herrmann et al. 2007; Rakoczy 2009; Tomasello 2014, 2019b; Tuomela 2007). This proposal covers human evolution up until and including the emergence of hunter-gatherer groups. It is unclear, however, whether it also covers the emergence of more complex human societies.

In this paper, I argue that it does not. Rather than in terms of collective intentionality, I will argue that the emergence of complex societies can best be understood in terms of the human proclivity for what I shall label ‘conventionality’: a set of non-strategic attitudes that stabilize and standardize human interactions so that role divisions, institutions, norms and conventions can emerge as group-level phenomena. Conventionality is, I will argue, a product of group-level selection rather than a product of collectively intending minds.

The paper is set-up as follows. In the next section I will briefly introduce the phenomenon of collective intentionality and the proposal to employ it in the characterization of the evolutionary trajectory that led from uncollaborative apes to highly collaborative ones. In Sect. 2, I will zoom in a bit more on this proposal and show how collective intentionality as an explanation of human collaboration involves a motivational and a coordinative component. In Sect. 3, I argue that though collective intentionality explains coordination in complex societies, it cannot explain the motivations of individuals to collaborate as contributions to a collective effort to secure the surviving and thriving of the group. I argue that this problem cannot be alleviated by appealing to group identity as a motivation, because this move is descriptive rather than explanatory.

In Sects. 4 and 5, I will explore an alternative account of collaboration in complex societies. This account is modelled on examples, taken from the work of Joseph Henrich, of collective practices that have a group-beneficial effect, while individuals who contribute to these practices are not motivated by this effect and have no knowledge of how the effect comes about. I will call these practices ‘blind, smart practices.’ I argue that complex role divisions, in economic division of labour as well as in cultural institutions, are in many respects like such practices: they have a group beneficial effect, but our motivations to contribute to them do not reflect this effect and require no knowledge of it. We are motivated to play our roles by local concerns. Coordination ensues because we adopt norms, rules and conventions, not as a product of collective intentionality, but as a consequences of psychological tendencies and capacities that are the result of group-level selection. In Sect. 6, I will briefly discuss these tendencies and capacities, zooming in on overimitation as an important case in point, in order to distinguish them from the capacities required for joint and collective intentionality.

1 Collective Intentionality and the Evolution of Human Collaboration

The term ‘collective intentionality’ was coined by John Searle in 1990, but the concept already had a long history by then (Searle 1990). Dürkheim’s ‘collective consciousness’ (Durkheim 1893/1984) and Collingwood’s ‘joint enterprise’ (Collingwood 1947), for example, can in many (not all) respects be considered as precursors. Searle starts from simple examples. Small scale cases such as painting a house together or taking a walk together are characterized by irreducible ‘we-intentions’—intentions to jointly perform a group action. Intending to take a walk together, is not the same as each of us separately intending to take a walk at the same time and place. Intending to paint a house together with your spouse is not the same as intending to paint the house at the exact moment your spouse intends to do this as well.

There is widespread agreement over the fact that intending to do something as a collective cannot be reduced to a sum or pattern of individual intentions. But intentions require minds, and according to most, groups are not minds in more than a metaphorical sense. So collective intentions are had by individuals. But what does it mean for me to intend the group I am a member of to collectively do something? Accounting for the existence of ‘we-intentions’ as distinct from individual ones makes up a large part of the philosophical debate on collective intentionality. Explanations range from fully individualist and normatively austere—for instance in terms of common knowledge of each other’s we-intentions and meshing sub-plans (Bratman 2014)—to fully anti-individualist and normatively rich—for example in terms of the joint commitments of an irreducible plural subject (Gilbert 2014).

One of the features that makes collective intentionality an interesting topic of research is that it occurs at various scales. Intending to take a walk together is just as much an instance of collective intentionality as intending to uphold certain norms as a group, or to treat certain slips of paper as money.Footnote 1 This allows us to approach questions about the nature of institutions or the normative practices underlying large scale collaborations by looking at small scale ‘models’ of collective intentionality such as collaborating dyads. Searle, Tuomela and Gilbert use their respective views on small scale collective intentionality as the basis for understanding the normative and institutional infrastructure of entire societies. Bratman used to focus on small scale collective intentionality only, but recently agreed that with further specifications, small scale collective intentionality is the basis for large scale institutions (e.g. Bratman 2022; Bratman 2014; Gilbert 1989; Searle 1990; Tuomela and Miller 1988).

This scalar nature makes collective intentionality a particularly interesting conceptual tool for theorists of human evolution. The gap between complex collaboration in human societies and the very limited degree of collaboration that can be found in other apes is massive. If human collaboration hinges on a feature that comes in degrees, this seems eminently suitable to fill this gap in a plausible way. This was Michael Tomasello’s insight. For this particular use of the concept, philosophical details matter less (though see e.g. Angelino 2022 and Pettit 2020). Rather than on a theory, his focus is on the phenomenon as highlighted in the philosophical literature—our capacity to intend and think as members of a group—and more importantly on the phylogeny (reflected in ontogeny) of the psychological capacities and motivations that are required for various forms of collective intentionality. Small scale collective intentionality, for instance in collaborating dyads, requires social cognitive capacities such as mindreading, gaze tracking and joint attention. Large-scale collective intentionality, such as collectively upholding norms or treating slips of paper as money, hinges also on referential communication and cultural learning as well. From an evolutionary perspective, then, it makes sense to

distinguish cases such as Bratman’s house painters and Gilbert’s walkers – who are essentially collaborating as dyadic partners with joint goals and joint commitments – from cases such as Searle’s patron at a French café enmeshed in the institutional reality of money, café owners, and social norms of café behavior, all in the context of governmental laws, licenses, and restaurant inspectors. (Tomasello 2016, p. 61)

For this purpose it is useful to use separate labels for small and large scale collective intentionality, which are respectively baptized ‘joint intentionality’ and ‘collective intentionality’. The central evolutionary thesis, then, is that

the uniquely human adaptations for cooperation evolved in two key steps. The first step comprised adaptations enabling early human individuals to cooperate with one another dyadically in obligate collaborative foraging (with partner choice); these are the skills and motivations of joint intentionality. The second step comprised adaptations enabling modern human individuals to cooperate with one another in the larger collaborative enterprise known as culture; these are the skills and motivations of collective intentionality. (Tomasello 2019a, p. 9)

The differences between joint and collective intentionality are an integral part of Tomasello’s proposal. But the differences—in scale and in terms of required cognitive capacities—do not undo the continuity between them. Tomasello recognizes this by capturing both under the heading of ‘shared intentionality’ and the two-step evolutionary thesis as the ‘shared intentionality hypothesis.’ This allows him to follow the example of the philosophers and use joint intentionality as a model for understanding collective intentionality.

2 From Dyadic Collaboration to Hunter-gatherer Groups

Joint and collective intentionality involve thinking as a group-member. Tomasello characterizes this kind of thinking in dyads as “a kind of we > me self-regulation of the collaborative activity in which the “we” is the joint agency to which the [collaborator] has jointly committed and which she must, to maintain her cooperative identity in the partnership, respect.” (Tomasello 2019b, p. 6) The dyad’s interest determines our commitments and felt obligations. This need not imply going against individual interests. For the background of this self-regulation—implicit but important for what is to follow—is the fact that collaborators are motivated by the jointly intended outcome of their enterprise. Individuals can thus in principle understand and rationalize their contributions to joint efforts in terms of ‘global’ (dyad-level) concerns. This is why Tomasello speaks of ‘we > me self-regulation’.

With respect to the second step in the evolution of human collaboration, collective intentionality, the same principles apply: “In [a] cultural-institutional context (…) individuals continue to self-regulate in a we > me manner, but now the “we” is our culture (…).” (Tomasello 2019b, p. 7) This allows us to understand the psychology of moral obligation in larger groups as an institutionalized version of the kind of obligation one feels in dyadic teamwork towards one’s collaborator. In dyadic collaborations, a feeling of obligation towards one’s partner—e.g. to share the gains of joint labour fairly—is based on an understanding of the way in which one’s own efforts hang together with and are dependent on the efforts of one’s partner, and vice versa. Hence, commitment to a shared task involves mutual feelings of obligation to invest sufficient effort and to share jointly acquired returns fairly. The idea is that feelings of moral obligation towards members of one’s own cultural group are essentially scaled-up versions of this type of felt obligation. Similarly, institutionalized sanctions in cultural groups can be viewed as scaled up versions of protest to poorly performing partners in dyadic collaborations. Here too, there is we > me self-regulation, implying that individuals are able to understand and rationalize their contributions to the collective effort in terms of global, groups-level concerns.

Two things are important to notice. First, the cultural groups that are referred to here are hunter-gatherer groups, not complex societies. Secondly, like with dyadic collaboration, the self-regulation at issue in group norms and normativity is motivated and informed by a goal of which all group members know that it can only be achieved collectively. Both elements feature in Tomasello’s introduction of the second step in human evolution, the onset of collective intentionality:

The most fundamental assumption guiding this second step of our account is that a cultural group – that is, evolutionarily, a hunter-gatherer group with clear demographic boundaries as characteristic of humans for most of their history – is nothing more or less than one big collaborative activity in which “we” as a people operate with a collective commitment to the group’s surviving and thriving. Each individual has her role(s) to play in this collective commitment – both as a member of the group in general and, possibly, as a person playing some more specific division-of-labor role – and this generates, in a scaled-up manner, more universal normative expectations. (…) Because we all value the group’s smooth functioning, we all must do things in the ways that we all expect us to do them. In addition, to be a good group member, we must also make sure that others follow these norms as well (especially by normatively protesting violations). The enforcement of social norms is thus a kind of scaled-up, third-party version of the second-personal protest characteristic of joint intentional collaboration (…). (Tomasello 2019b, p. 7)

So, just like dyadic collaboration, collaboration in hunter-gatherer groups (i) is motivated by a collective goal (the group’s surviving and thriving; the group’s smooth functioning) and by recognition of the fact that this goal requires strongly interdependent efforts, and (ii) coordinated through commitments to shared norms, conventions and institutions.

3 From Hunter-gatherer Groups to Complex Societies?

To what extent can this ‘recipe’—consisting of a motivation-component and a coordination-component—also be used to explain the upscaling from hunter-gatherer societies to the more complex societies that followed the onset of agriculture and sedentation? If we start with the coordination-component, the prospects seem good. Collective intentionality explains the emergence of norms and institutions and these are even more prominent and indispensable in complex societies than they are in hunter-gatherer groups. It also explains the emergence of linguistic conventions and practices of cultural transmission underlying cumulative culture, which are major factors in such upscaling. Hence, if we focus on coordination, collective intentionality appears to be a likely driving force behind the transition from hunter-gatherer-sized collaborations to complex societies, more or less in the same way that it has driven the transition from dyadic to hunter-gatherer collaboration.

What about the motivation-component? Here things look different. The parallel with dyadic and hunter-gatherer collaboration is a lot more problematic, and arguably even absent. In dyadic collaborations, both contributors have a clear idea of the overall enterprise and their roles in it. In hunter-gatherer groups this idea might be somewhat more general, but all members will have a pretty good idea of the various roles and their interconnections that jointly work towards the surviving and thriving of the group.Footnote 2 In both cases, understanding one’s own place in the overall collaboration, aimed at the shared group goals, motivates collaboration. But in societies with complex role divisions, no one has anything even remotely resembling an overview of how all roles in the total collaboration hang together and of how (or even whether) this sustains the surviving and thriving of the group. Everyone may believe that their tasks somehow contribute to the enterprise of the group, but the concrete difference that any single task makes to the collective outcome is unclear and in all likelihood close to negligible. It is difficult to see how this can motivate individuals to contribute to the collective effort.

Tomasello avoids this problem by making the motivation to contribute to group collaboration in larger groups no longer dependent on an overview of how roles hang together to achieve group goals, but on something that can be viewed either as a way of summarizing, condensing and abstracting such an overview, or simply as replacing it: group identity. ““We” are those people who talk, think, dress, and eat in these particular ways. Being a member of the group means identifying with these ways (begun by our revered ancestors) (…).” (Tomasello 2019a, b, c, p. 7) The need for group identity is explained in terms of inter-group competition: “with the rise of modern humans, entire cultural groups—potentially encompassing whole clans or tribes with individuals who might not even know one another—became cooperating units as they competed with other human groups for valuable resources in cultural group selection.” (Tomasello 2014, p. 133) Sensitivity to group identity, including the norms, conventions and institutions, is just as much a feature of contemporary humans in complex societies as it is of hunter-gatherers. This starts at an early age:

By five or six years of age children understand others as members of a cultural group based on similarity of appearance alone, and they treat everyone in their group as worthy of just treatment from themselves and the group. Preschool children already favor those in their cultural group, and they also expect them to favor those in their cultural group as well as to respect the group’s social norms. All this reflects children’s growing understanding of collective social products such as social norms and conventions and how they work, based on maturing capacities for collective intentionality and group-mindedness. (Tomasello 2019a, pp. 273–274)

‘Group-mindedness’ is a name for being motivated to contribute to the collective effort of a group with a certain identity and to accept the group’s norms and institutions. This motivation no longer depends on an idea of how one’s own contribution connects with those of others to contribute to reaching a collectively intended goal. It is a highly sublimated version of that, which is much more direct and visceral. But it is still a form of we > me self-regulation, because it involves internalizing ‘the community’, or the ‘generalized other’, as a ‘cultural super-ego’ (Elias 1939/1994; Mead 1934).

Does group-mindedness save the idea that collective intentionality can account for the motivation-component in an explanation of complex collaboration? There are two reasons to doubt this. First, group mindedness deviates from the philosophical notion of collective intentionality, simply because there are no concrete collective intentions involved. Rather than collective intentionality it seems somewhat more like collective willful submission to an authority (whose intentions are opaque). But this may be too quick. There is a way in which we can save the use of the term ‘collective intentionality’, even when collaboration is motivated by something as abstract as group identity and group-mindedness. We can think of groups that collaborate on the basis of group mindedness as collectives that display intentionality at the group-level, for example by adopting what Daniel Dennett calls the intentional stance towards a grous as a whole (Dennett 1987, 1991). This is certainly an option (Huebner 2014; Tollefsen 2002). But used in this way, the terms ‘collective intentionality’ presupposes a collective of collaborating and coordinating individuals; it is no longer an explanation of how and why these individuals are motivated to collaborate.Footnote 3

There is a related worry. The ‘cultural super-ego’ is an abstract notion in the sense that the internalized cultural group is always an ‘imagined community’, “because the members of even the smallest nation will never know most of their fellow-members, meet them, or even hear of them, yet in the minds of each lives the image of their communion.” (Anderson 1983, p. 6)Footnote 4 If this ‘image’ is involved in explaining our motivation to follow the same conventions and norms and contribute to the same large-scale collective enterprise (the thriving of our society), then this must be because we share a similar enough ‘image’ of our cultural group’s communion. But what does that mean? And how can we know that the ‘images’ of group members are all alike? The situation is like Wittgenstein’s beetle in the box: what it means to think all boxes contain beetles is that we all call them beetles. Applied to group-identity: what it means to say that group members share the same cultural super-ego is that all members are motivated to collaborate and follow the same norms and conventions in the name of the same community. This is effectively defining shared group-mindedness in terms of what it does: making us contribute to complex role divisions and adhere to a culture’s norms and conventions. But if we define the notion of group-mindedness thus, claiming that it explains our being motivated to collaborate in complex societies becomes viciously circular. Group mindedness might turn out to be a descriptive rather than an explanatory notion.

I will not claim that these worries are knock-down arguments against the possibility to account for the motivation to contribute to complex, large scale collaborations in terms of collective intentionality. But they are at the very least serious enough to warrant further investigation and exploring the possibility of an alternative view on complex collaboration. In the next two sections, I will argue that in complex societies, collaboration is not driven by global, i.e. group-level concerns, consisting either of a more or less concrete overview of the roles in the collaboration, or of a group-minded commitment to an imagined community. Local concerns are sufficient, given that shared norms, conventions and institutions are in place. But these are not the products of collective intentions Rather, they are non-strategic psychological tendencies and capacities, summarized under the label ‘conventionality’, that are the product of group-level selection.

4 Blind, Smart Practices

Let me start by discussing instances of group-level practices in which the individual motivations to contribute do not reflect the actual beneficial outcome of such practices. I take my cue from some of Joseph Henrich’s work. Henrich is more concerned with cultural transmission than with coordination, but the implications of his views for coordination will become clear later on. A recurrent theme in Henrich’s work is that “[l]ike natural selection, our cultural learning abilities give rise to “dumb” processes that can, operating over generations, produce practices that are smarter than any individual or even group” (Henrich 2016, p. 12) The idea that cumulative cultural evolution and cultural transmission is crucial to our intellectual and cognitive abilities, and that this requires highly developed social learning skills, is also part of Tomasello’s views and generally accepted (e.g. Herrmann et al. 2007; Herrmann et al. 2010; Heyes 2018; Richerson & Boyd 2005; Sterelny 2012). But Henrich takes this idea a bit further than others, by arguing that in a sense thriving cultural groups form some kind of ‘collective brain’, as he calls it, and by applying it to practices that involve the participation of many individuals. Some evolved cultural practices, Henrich claims, embody a kind of group-level intelligence that may very well elude the people partaking in these practices:

a vast body of knowledge that we inherit culturally from previous generations comes buried in daily cooking routines, taboos, divination rituals, local tastes, mental models, and tool-manufacturing scripts. These practices and beliefs are often (implicitly) MUCH smarter than we are, as neither individuals nor groups could figure them out in one lifetime.” (Henrich 2016, p. 112).

In order to grasp the claim that is being made here, it is good to look at some examples (Henrich 2016: 96–116).

The use of divination rituals to determine where hunters should look for game is, Henrich argues, an adaptive practice. It turns out to have a positive effect on the returns of hunting—though for reasons that are not available to those participating in it. Humans have a tendency towards patterned behaviour (for good evolutionary reasons, such as being predictable for each other, which facilitates collaboration) (Gilovich et al. 2002; Kahneman 2012). For example, hunters may tend to go and hunt where they last killed a large game animal. Game animals can pick up on such patterned behaviour and make sure they are where the hunters will likely not be. The situation can be modelled in game theoretical terms as a ‘matching pennies’ game (a game humans are not as good at as e.g. chimpanzees (Martin et al. 2014): you and I both have a penny, which we lay on the table with either heads or tails facing up; when both pennies have the same side up (when the animal and the hunter are in the same spot), I (the hunter) win; otherwise (when the animal is where the hunter is not) you (the game animal) win. For both parties in this game, the best strategy is to behave as randomly as possible. The human tendency for patterned behaviour is thus a serious disadvantage in this game. But this disadvantage can be countered by relying on the random outcomes of divination rituals to determine where the hunt is going to take place. (Moore 1957) Groups in which hunting magic is practiced will therefore hunt more profitably than other groups.

More straightforward examples of practices that harbour such a collective kind of intelligence are food preferences and taboos and the often very elaborate procedures for preparing certain foods. Manioc preparation is a typical case in point. Raw manioc does not only taste bitter, its consumption also leads to chronic cyanide poisoning, causing neurological problems, developmental disorders, thyroid problems and suppression of the immune system in the long run. There are several methods of de-toxifying manioc, all of which are elaborate multi-step processes that take several days and yield a non-bitter tasting staple food that contains almost no toxins. These elaborate procedures are handed down from generation to generation, but no one knows exactly what each step does. It might be tempting, even rational (on an individual level), to skip labour-intensive steps that do not seem to do much with respect to the bitter taste. In the long run this might lead to all kinds of health problems, but that causal connection will not be transparent and is hence no direct deterrent for skipping steps in the processing of manioc. However, nobody skips steps. Respect for the cultural ways of doing things (the ‘mindless’ normativity of tradition) might seem like a somewhat irrational attitude, but it turns out to be very clever—if we look at things from a distance.

Let me call such practices ‘blind, smart practices.’ These practices are selected in long processes of cultural evolution. They are selected for their group-level beneficial effects.Footnote 5 Or, better put, they are selected through these effects. Henrich’s approach to cultural evolution is what Lewens calls ‘kinetic’ thinking and what Heyes calls ‘populationist’ thinking (Heyes 2018; Lewens 2015): beneficial ideas and practices spread because the groups that harbour and transmit them thrive and expand and ultimately outcompete groups with less beneficial ideas and practices.Footnote 6 If this requires adopting odd beliefs, attitudes of seemingly irrational compliance, a basic trust in tradition, or simply not thinking too much about why things are done in a certain way, then so be it.

The claim I wish to defend in the next section is that large-scale collaboration is a social practice that is in relevant respects similar to blind, smart practices. The attitudes that sustain this practice need not be irrational, but neither need they reflect the reasons why large-scale cooperation is beneficial for the entire group. Collaboration occurs neither because we have the ability to oversee the entire collaboration and appreciate our parts in it, nor because we have internalized the same cultural super ego, but because we have evolved proclivities to adhere to roles and routines, conventions, norms and institutions that regulate role-interaction (for a more elaborate discussion, see Slors 2021a, b, 2022). I will argue that whatever motivates and enables individuals to play these roles and ‘enact’ these institutions, organizations, norms and conventions need in no way reflect the reasons why and how the network they enable is self-sustaining—exactly like blind, smart practices.

The relevance of complex division of labour, both in an economic sense and in the sense of role-differentiation in cultural institutions, is, I believe, underestimated by Tomasello and many others (with a few exceptions such as Smaldino 2014).Footnote 7

5 Complex Role-divisions

In order to substantiate this claim and to extract an alternative for collective intentionality as an explanation for collaboration in complex societies, it is helpful to start with a definition:

Blind, smart practices are collective practices the individual contributions to which are neither motivated nor informed by the real beneficial effect—in terms of the well-being of individuals, protection, stability and collaborative functioning of the group and/or reproductive success—of the practice for the collective.

Both examples mentioned above are covered by this definition. Manioc processors will have some idea of the overall beneficial effect of the whole procedure, but they are unaware of the consequences of skipping steps and of the chemical process that explain their relevance. Hence, neither can they be motivated by it. They are motivated mostly by a tendency to follow tradition. Shamans who perform hunting magic rituals and hunters who take their advice are motivated by what they think is the beneficial effect of their actions (ancestral spirits pointing them to where the game is, say), not by the real beneficial effect of their rituals (randomizing behaviour) of which they have no knowledge.

I want to argue that many (not all) forms of complex role-division in human societies are in many (not all) respects similar to blind smart practices. I will use the phrase ‘complex role-division’ as a cover-all term to refer to economic division of labour, involving forms of collaborative interdependence, and to various cultural institutions such as legal systems, political systems, educational systems, and monetary systems, involving forms of social organization. In order to determine the extent to which complex role divisions match the definition of blind, smart practices, we must first look at the beneficial effect many of them have and then look at what motivates individuals to contribute to them. For obvious reasons I cannot discuss these practices in any amount of detail but I believe that a cursory discussion suffices to highlight the relevant similarity.

Let me start with economic division of labour. Despite the negative side-effects of this practice highlighted by Marx, Durkheim and others, it is undisputed that it has group-level benefits. In The Wealth of Nations—the first systematic treatment of division of labour—Adam Smith mentions three: increased dexterity, no time spent on task switching, and the focus to improve or even mechanise specializations. Smith calculates that division of labour in an 18th century pin-making factory in eighteen separate steps increases the production from a maximum of 20 pins per person per day to 48,000. Further advantages not mentioned by Smith are selecting the right person for the right job and a decreased need for investment in learning. And then there is the division of cognitive labour which implies a huge improvement to collective efficiency by reducing redundant work: by relying on the fruits of each other’s cognitive activity, the total cognitive capacity of a group multiplies immensely.

The beneficial effects of cultural institutions on large communities are less easy to summarize but equally undeniable; although examples of non-beneficiary institutions (dictatorships or Kafkaesque bureaucracy) are not hard to come by either. Legal systems stabilise societies, political systems do the same and enable collective decision-making, educational systems secure efficient transmission of cultural knowledge, organized religion promotes pro-social behaviour and social cohesion, etcetera. For the sake of brevity I cannot substantiate these claims but I shall treat them as uncontroversial.

In order to claim that economic division of labour and many cultural institutions are like Henrich’s blind, smart practices, individual contributions to them must neither be informed, nor be motivated by their overall beneficial effect. Let me start with information. For almost all of the roles we play in daily life, it is the case that the amount of knowledge needed in order to execute them is at best a tiny fraction of the amount of knowledge we would need to grasp the intricacies of these structures and institutions. In present day complex societies this is obvious. We are all players in an economy of divided labour, subjects of legal and political systems, and users of monetary and educational systems. It is safe to say that none of us understands all the ins and outs of any of these systems. What we need to play our parts in them is ‘local knowledge’, that is, knowledge that pertains to our own roles and the roles of those we have to interact with to carry out our own roles. Global knowledge—knowledge of these systems as such and the ways in which they ensure collective benefit—is of interest to scientists and scholars and policy-makers only. And even there: whatever knowledge we have of the global working of our economies or legal systems, and whatever control we aim to exercise, this is knowledge and control that is still distributed over many people.

Then again, it may be objected, we do have a general sense of the beneficial effects of, say, our legal system or of economic division of labour. Could this not be what motivates us to play our roles in these institutions? Yes and no. It can (but crucially need not—see below) motivate us to play a role, but it cannot motivate us to play this role rather than another equally important one. In this respect contributing to cultural institutions or systems of divided labour is exactly like the manioc processing example. There too, participants have a general sense of the beneficial effect of the overall procedures of ridding manioc from toxins. What they lack is a detailed enough knowledge to understand that some steps can absolutely not be skipped.

What motivates manioc processors to comply with the entire processes nevertheless, despite the tediousness and seeming uselessness of some steps, is thus not knowledge, but the local concern of following tradition and fitting in with prevalent group practices. This is what we find in complex role-division too. If we look at economic division of labour first, Adam Smith’s metaphor of the invisible hand is a perfect parallel (not as a normative neoliberal thesis, but as a thesis connecting individual motivation to collective effects). In a common interpretation, the idea expresses the surprising notion that an overall beneficial effect for the collective can be attained while everybody is motivated by self-interest, that is, while nobody is motivated by that beneficial collective effect itself. This interpretation fits best with The Theory of Moral Sentiments (1759). In The wealth of Nations (1776), Smith speaks more specifically of the motivation of merchants and manufacturers, consisting on the one hand of self-interest and on the other of a bias towards investing in domestic industry and against offshore outsourcing (Mark Pagel calls this tendency to maintain local ties ‘social viscosity’ (Pagel 2012)). But here too, the idea is that the overall beneficial effect for the collective of the decisions and actions of merchants and manufacturers plays no role whatsoever in their motivations:

(…) every individual (…) endeavours as much as he can both to employ his capital in the support of domestic industry, and so to direct that industry that its produce may be of the greatest value; every individual necessarily labours to render the annual revenue of the society as great as he can. He generally, indeed, neither intends to promote the public interest, nor knows how much he is promoting it. (Smith 1776/2007, p. 293)

The metaphor of the invisible hand depicts division of labour as a perfect exemplification of a blind smart practice as defined above. The extent to which the metaphor captures reality is debated. We are motivated by more than self-interest and a bias favouring the domestic market. There is no perfect parallel between division of labour and blind smart practices, then. Nevertheless, it is important to emphasize the extent to which the parallel does uphold.

The point here is not so much that people are exclusively selfish (there is sufficient evidence against that claim, in part based on considerations involving group-level selection (Haidt 2012; Seabright 2010)). Rather, the point is directly connected with our lack of knowledge—and the lack of need for such knowledge—of the intricacies of structures of divided labour and the ways in which they do and do not contribute to the benefit of the community as a whole. What motivates us to carry out our tasks are first and foremost local concerns, concerns about our own business and the businesses of friends and family and those we need to interact with. Other considerations, such as considerations about the interest of society at large, are merely optional extras.

What about institutions? Here too, the parallel with blind smart practices is not absolute but nevertheless striking. What makes the case of institutions different from blind smart practices is the fact that some participants in some institutions clearly motivate their contribution in terms of the overall beneficial effect of these institutions, probably more obviously so than in the case of economic division of labour. Many lawyers and judges are at least in part motivated by their belief in the legal system as a beneficial force in society. Many politicians (despite public caricatures) are motivated by the belief that they can contribute something to society as a whole. But here we must first notice, in view of what was said above about the possibilities of having knowledge about how and why certain institutions have collective beneficial effects, that these beliefs are usually of a very general nature. Once again, they are like the manioc processor’s sense of the overall beneficial effect of their elaborate procedures. Such general knowledge only leads to a general motivation to contribute to these institutions. It does not motivate the specific decisions and actions that actually constitutes the execution of functions within political or legal institutions. Most citizens simply find themselves born into a legal and a political system and play their parts in them in a way that resembles following tradition or that is at least more obviously the consequence of socialization than of reflective deliberation about the public good or the shared goals of one’s cultural groups. When we look at participation in monetary systems, educational systems and organized religion, this is even more clearly the case.

These brief considerations are, I hope, sufficient to conclude that structures of divided labour and many cultural institutions are in important respects similar to Henrich’s blind, smart practices. They benefit the collective while most of the individual contributions to them are motivated by local concerns, self-interest, tradition, custom, socialization, routines and by beliefs and interests that do not specifically reflect the benefit for the collective of these practices. That such non-group-oriented motivations nevertheless result in group-level coordination is explained by group-level selection: the psychological capacities and tendencies that underlie following tradition, favouring routines, being susceptible to conventions, etcetera, are selected for and passed on because they allowed stable role-divisions, norms and institutions to emerge as group-level phenomena, without any need for collective intentionality. In the next section I will briefly discuss these capacities and tendencies.

6 Conventionality

Complex role division requires two sets of attitudes. One set of attitudes must account for our ability to divide roles in such a way that we can concentrate on our own roles without devoting unnecessary cognitive effort to the tasks and roles of others. The other set must account for our ability to wield rules, norms and conventions as a means of coordinating our various tasks and roles. Let me briefly discuss a few of the main attitudes in these respective sets in order to show that they are not simply already captured by the capacities for joint intentionality (such as joint attention, gaze tracking and mindreading) and collective intentionality (social learning and referential communication).

Attitudes required for large scale role-division include basic trust and strong reciprocity. There are well worked-out ideas about how these attitudes may have emerged in the course of (cultural) evolution, but it would be too much of a detour to discuss these here (see e.g. Seabright 2010 for a good overview). For now the point is that they have emerged and that they allowed humans to divide roles in complex ways. In most coordinated role-divisions we are not able to observe the contributions of others all the time. If a group of hunters sends a small sub-groups ahead to close off the escape route for game animals, their chasing efforts only make sense when they can trust the sub-group to do what it is supposed to do. Slacking sub-groups might receive reproach by tribe members or other negative consequences later on, but how is the tribe to know if there wasn’t enough effort put in closing off escape routes, when it may as well be the case that a hard-working sub-group was outsmarted by game animals? Without a degree of trust, this type of division of labour would not get off the ground.

Larger communities, in which it is impossible for everyone to know everyone, require a more demanding form of trust: the ability to trust in-group strangers as a default attitude. When exchanges become one-off events involving individuals we will probably never again interact with, trust in exchange partners cannot be based on calculations of reciprocity. I might eat in a restaurant in a strange city I never intend to visit again. I can pay the waiter, whom I will never see again, or I can run away at a moment I am unlikely to be caught. Reversely, when I do pay, the waiter might accuse me of not having paid in an attempt to extract more money from me. But in situations like this, the waiter trusts that I will pay and when I do, I trust the waiter to not accuse me of not having paid (Basu 1984, pp. 9–11). Or, think of a farmer who sells her produce in exchange for valuable shells. If she cannot trust that these same shells will allow her to buy whatever she needs from completely different people, she will not accept the deal. Without the ability to trust in-group strangers, division of labour, exchange economies and institutions such as a monetary system would collapse.

Trust of the kind that is required for large scale coordinated role-divisions is based on what Herbert Gintis calls ‘strong reciprocity’ (Gintis 2000). Strong reciprocity is not calculated reciprocity. Rather than returning favours or expecting favours to be returned on the basis of self-interested calculation, strong reciprocators return favours and expect them to be returned as a standard, non-reflective attitude. Strong reciprocity is what is presupposed by the farmer who accepts shells for her produce, trusting that they will enable her to buy goods elsewhere. It characterizes the hunters who send a sub-group ahead to cut off escape routes for game animals, without being able to check whether that group will do what it is supposed to. And it is displayed by the waiter who brings me my food, trusting me to pay later without giving it a thought.

Trust and strong reciprocity relieve us of the burden to monitor each other endlessly (a situation that economist Paul Seabright refers to as the dystopian fantasy ‘Switzerland squared’ (2010, p. 73)). And so we free up cognitive capital that allow us to specialize and to divide (cognitive) labour. On the one hand this causes an explosion of the cognitive potential of the collective. On the other hand it creates massive interdependency: there is no other species of animal whose sustenance depends largely on factors and processes that it devotes no attention to (most of us can simply pretend that food grows in the supermarket without any detrimental effect on our lives).

Specialization also hinges on evolved attitudes such as role-modelling and apprenticeship. Again, well worked-out proposals for the nature and emergence of such attitudes exist but fall outside the purview of this paper (e.g. Henrich 2016, Sterelny 2012). For now it suffices to note that role-modelling and apprenticeship complement trust and strong reciprocity as psychological preconditions for the emergence of complex role-divisions.

But these attitudes do not yet account for our ability to coordinate our various roles through rules, norms, and conventions. I take it as uncontroversial that the coordination of divided roles is at the very least facilitated, but more probably enabled by such an ability (Lewis 1969; Slors 2021a). Compare playing a game of chess, in which various pieces play various roles. These roles are described by the rules of chess, without which the game would not exist. But it is nearly impossible to play a game of chess—and hence employ these rules—without an additional set of conventions about what each piece with a specific role must look like. Similarly, dividing labour not only requires grasping the rules that determine one’s own roles and tasks, but also wielding the norms and conventions that allow one to interact with roles played by others that are essential to playing one’s own role, such as social etiquette or role-specific dress codes. This involves psychological attitudes such as a conformist bias, norm psychology and overimitation. Again, there is a rich literature on how such attitudes evolved (e.g. Boyd and Richerson 2005, Wilson 2003, Henrich 2016, Sterelny 2012). Again, I cannot discuss most of them. But I need to make an exception for overimitation because tying this phenomenon to the emergence of rules, conventions and norms is not standard.

Overimitation is the unreflective imitation by infants of non-instrumental behaviour by models, a tendency observed in humans only (Horner and Whiten 2005). Overimitation is usually explained either as an overextension of the otherwise rational strategy of infants to copy the behaviour of adults (Lyons et al. 2007; Nielsen and Tomaselli 2010) or as social bonding behaviour, where blind copying is signalling ‘I am one of you’ (Jagiello, Heyes, & Whitehouse, forthcoming; M. Nielsen and Blank 2011; Over and Carpenter 2013). Henrich suggests that this tendency to adopt behaviours we do not understand the efficacy of also serves to perpetuate cultural practices with group-level benefits, such as the examples mentioned in 3.1. I want to argue that Henrich’s interpretation can be applied too to the human capacity to adopt cultural conventions and norms.

In fact, there are reasons to take this interpretation very seriously. In some overimitation experiments, children only copy the causally irrelevant bits of behaviour when the demonstrator is present (M. Nielsen and Blank 2011). In other experiments, children stop imitating non-instrumental behaviour when they see a few potential models not exhibiting the non-instrumental behaviour that an initial model exhibited (Evans 2016). The first observation does not square with the idea that overimitation is an overextension of the otherwise useful strategy to do as adults do—why would that behaviour cease to be useful when the demonstrator is absent? But it does fit well with the idea that overimitaton serves to convey ‘I am one of you’—there’s no point in signalling that when there is no one to signal it to. The second observation fits well with the idea that overimitation serves to acquire abilities one may not yet understand. As Kevin Laland notes, “where children see multiple demonstrations with some individuals, but not others, performing the irrelevant actions, the children rapidly infer that the irrelevant actions are unnecessary; rates of overimitation then plummet.” (Laland 2017, p. 53). However, it doesn’t fit with the idea that overimitation signals ‘I am one of you’. For why would that signal depend on others copying as well?

The point is that both observations fit perfectly with the idea that overimitation serves to makes us pick up on cultural conventions and norms—which, from a group-level selection perspective makes a great deal of sense as conventions and norms facilitate complex role-divisions. This was, indeed, one of the original suggestions made in the first study to demonstrate the phenomenon (Horner and Whiten 2005, p. 164; see also Schmidt and Tomasello 2012). So overimitation complements norm psychology and conformity bias as psychological attitudes that enable the conventions, rules and norms that facilitate the coordination of divided roles.

Trusting strangers, strong reciprocity, role-modelling, apprenticeship (to some extent), overimitation, norm psychology and conformist bias all have a crucial feature in common: they are non-strategic attitudes. They are characterized by a lack of reflection on the usefulness, appropriateness, or value of learned practices and knowledge, on the strategic wisdom of one’s contributions to collaborations, and/or on the reliability of collaborators. These attitudes promote following tradition, keeping to role-divisions, and in general exhibiting culturally standardized and therefore predictable behaviour. They make sense in terms of their group-level effect rather than in terms of individual rationality. If conventions are broadly conceived of as shared regularities in behaviour that are sustained by a shared set of expectations and preferences, these attitudes can be characterized as attitudes that underlie human conventionality.Footnote 8 Conventionality, rather than collective intentionality, underlies collaboration in complex societies.

7 Conclusion

I conclude that though collective intentionality can explain collaboration in dyads and hunter-gatherer groups, it cannot explain collaboration in complex societies. Rather than by collective intentionality, collaboration in complex societies is the result of what we might label ‘conventionality’: non-strategic attitudes that stabilize and standardize human interactions so that role divisions, norms and institutions can emerge as group-level phenomena.

This conclusion invites questions for further research. The most pressing one is the following. It is highly unlikely that attitudes for conventionality are not present in hunter-gatherer groups. If conventionality can account for collaboration and coordination in complex societies, why not also in hunter-gatherer groups? Or even dyads? One response here is that in dyads and hunter-gatherer groups the community is not imagined, and group members will have at least a rough overview of the entire collaboration. It is precisely the absence of both elements in complex societies that necessitates an alternative for collective intentionality. However, it is probably more realistic to assign a role to conventionality too when it comes to accounting for small scale collaboration and coordination, at least in hunter-gatherer societies. This would explain how a hunter-gatherer psychology turned out to be sufficient for the emergence of complex societies, some 12.000 years ago, for example. But that does raise the question what the role of conventionality is in small scale collaboration relative to the role of collective intentionality. It is likely that conventionality becomes more important when collective intentionality becomes more of an abstract notion, for instance due to increased group size. But when that is depends on the question when and whether collective intentionality is a psychological reality, rather a philosophical reconstruction.