1 Introduction

Contemporary philosophy of mind is permeated by dichotomous thinking. One particularly tenacious dichotomy is between “automatic” and “controlled” processes. Controlled processes are usually thought of as intentional, intelligent, conscious, voluntary, propositional, effortful, on the personal level, and explicit. Opposed to that, automatic processes are usually thought of as unintentional, unconscious, uncontrolled, unintelligent, involuntary, non-propositional, effortless, implicit, on the sub-personal level, and mechanistic. Commonly, philosophers associate controlled processes with ‘action’ and automatic processes with ‘mere behavior’.

The categorization of a phenomenon as automatic typically implies that it is ascribed one or more of the features of unintelligent, effortless, and so on. In the past, the connections between these features were thought to be rather stringent. For example, when something was proven to be effortless, it was therefore also considered automatic. The features listed under the label automatic indeed appear to go together in many cases. The same goes for the features listed under the label controlled. There are, however, problems with this strict dichotomous understanding, which too easily links such a variety of distinct elements. The trouble is threefold. First, although the features of automatic processes often group together in actual phenomena, there is ample evidence that they can also appear independently (Bargh 1994; Fridland 2017; Moors 2014; Moors and De Houwer 2006, 2007). In other words, one and the same phenomenon can be (for example) both automatic and controlled because it is efficient yet conscious. Secondly, many of the features of automatic and non-automatic processes should not be understood as either being present or absent, but rather as appearing on a spectrum (Moors and De Houwer 2006, 2007). A process can, for example, be more or less effortful, more or less controlled, etc. A dichotomy between automatic and controlled processes does not allow for such nuances. The third problem is that, as long as we adhere to dichotomous thinking, we easily slip from characterizing a process as automatic and unintelligent to saying that it is “not an action”, or mere behavior (Fridland 2017). In other words, using the term “automatic” to characterize something is often problematic and in need of further explanation. In the remainder of this paper I will sometimes refer to behavior as automatic. In those cases, I mean that the behavior in question shows one or several of the many features that are associated with automatic behavior, but not necessarily all. You could say that I appeal to the every-day, or folk-psychological, notion of automaticity.

Despite its problems, this dichotomy pervades the debate on joint action. Theories of joint action generally mirror the assumed dichotomy of mere bodily movement and intentional action in the distinction they draw between emergent, highly automatic coordination (joint mere bodily movement) and intention-based, non-automatic coordination (joint action). The three above-noted problems with this dichotomy have yet to be properly taken on board in this literature.

Other fields have addressed these issues better. In the theorization of individual action, for example, ideas regarding the “in-between” are informed by theories on habitual action and skillful action. Habits and skills allow us agentic ways of guiding complex action routines that would otherwise overwhelm our reflective capacities. In this paper, I look at I look at how theories of skill, habit, and know-how in individual action can inform a non-dichotomous account of joint action. If, as William James (1890) wrote, “habit covers a very large part of life” (p. 104), this would presumably extend to the collective domain. I propose (a) that many collective behaviors that are not considered joint actions should be, and (b) that many cases that are currently understood as full-blown-intentional joint action are better described and understood as skillful or habitual joint actions.

More concretely, I evaluate recent discussions of attention and control in individual action (Christensen et al. 2016; Fridland 2014; Wu 2011) and connect them to the joint action debate. The debate on skillful (individual) action is strongly intertwined with a discussion of know-how. I argue that know-how is only part of the story at both the individual and the collective level. Specifically, I will show that, beyond focusing on know-how, discussions of joint action should focus on different kinds of control and the role of attention. Once this is done, it becomes clear that any account of group know-how needs to be supplemented by (a) an account of the role of attention and (b) an account of the multiple ways agents control their actions, and also how they control their coordination with other agents.

This paper is set up in three parts. Part one (Sect. 2) centers on habitual and skillful individual action. It explains why habits and skills belong to the domain of action and differentiates habits from skills. It does the latter by looking at the role of attention and control in the case of individual skills. The second part focuses on habitual and skillful joint action. It looks at the minimal architecture model (Sect. 3.1), as it explicitly sets out to fill the space between “mere bodily movement” and “full-blown intentional action”, and Birch’s theory of group know-how (Sect. 3.2) that builds on the minimal architecture model. Section 3.3 argues for an interaction-dominant view of joint agency. It extends the discussion of the different forms of control and attention from Sect. 2 to the joint level. Part 3 (Sect. 4) looks at the implications of a focus on attention and the various levels of control for our understanding of joint habitual actions and joint skillful actions.

2 Habits and Skills

Most philosophers acknowledge that actions do not always issue from deliberation. Agents do not always pause to consider their reasons for acting, nor do they need to have any conscious thoughts about how to proceed. But such negative characterizations merely mark a contrast between (purposive) agency and ‘full-blooded’ agency. It remains unclear what the characteristics are of acts that are described in this negative manner. What is clear, however, is that automaticity plays a crucial role in the differentiation between full-blown intentional actions on the one hand and purposive actions on the other. Skillful and habitual actions, which entail some level of automaticity, are generally considered to belong to the category of purposive agency, which sets them apart from automatic (and less agentic) phenomena such as compulsions and reflexes (Frankfurt 1978; Pollard 2008; Velleman 2000).

Habits and skills share their acquired automaticity as a characteristic feature (McGeer 2018; Ryle 2009).Footnote 1 Things one does in exercising a skill and in manifesting a habit issue largely from sub-personal processes established through habituation. The fact that these automaticities are acquired differentiates both habitual and skillful behavior from mere reflexes (Pollard 2006). And in this respect both also stand in contrast to cases of ‘full-blooded’ agency. Although habits and skills share this acquired automaticity, there are good reasons to differentiate them. Partly, this has to do with the type of automation we are talking about. Habitual actions are typically understood to be highly context-dependent and to be roughly identical in every iteration (McGeer 2018; Ryle 2009; Wood and Rünger 2016). Both the initiation and the continuation of the habitual act can take place effortlessly without the agent paying much attention to the act. Skillful action, on the other hand, is dependent on our attention and the presence of a goal or intention. In the remainder of this section, I will discuss briefly why both habits and skills belong in the realm of actions (Sect. 2.1) and look at their similarities and differences (Sect. 2.2).

2.1 Habits and Skills are characteristic forms of Action

Automation, habit, and skill have often been used interchangeably.Footnote 2 All are related to habituation; the process of drilling or training that facilitates certain action-choices over others. Habituation can be a deliberate choice, or an evident consequence of choice, which allows habits and skills to stand in a relation to full-blooded agency (Douskos 2017; McGeer 2018; Ryle 1949).Footnote 3 This section discusses four reasons to include most habitual behavior in the domain of action. I focus on habits as skills are typically understood to be goal-directed and therefore more easily tied to the domain of action. Section 2.2 points at reasons to distinguish between habits and skills.

To start with, we tend to understand habits and skills as highly automatic phenomena. Consequently, they are often placed in the realm of mere behaviors and mere reflexes (problem one and three of the dichotomous characterization as set out in the introduction). Several features that are typically associated with automaticity are indeed involved in the process of habituation. Like reflexes, habits are activated in an autonomous fashion without requiring executive control (Evans and Stanovich 2013). Habits and skills, however, are not synonymous with automaticity but are best understood as learned automatic responses with specific features (Wood et al. 2014). Although this typically takes time, agents have the power to diverge from habituated processes, which sets habitual actions apart from more automatic phenomena such as reflexes and compulsions (Pollard 2006). In Sect. 2.2, where I discuss the domain of habitual movements and habitual actions, I say more about conditions that need to be in place in order for such a power to be acted on. Moreover, agents are capable of changing their habits over time, and of changing their environment to prevent certain triggers that activate their habits from appearing. In other words, agents retain a degree of control in relation to habitual processes (Horstkötter 2015). This suggests they can belong in the realm of action.

Secondly, a problem with habitual behaviors being actions is that they do not seem to be under the direct control of the agent. There is, however, a type of control that is also present in the case of habitual agency: intervention control (Pollard 2006). The agent has the ability to intervene in their performance. There is a potential to stop or alter what I am doing habitually, which is absent in the case of reflexes, phobias, and compulsive behaviors. We can override a habitual action that extends in time and we tend to hold people responsible for the exercise and consequences of our habits, unlike in the case of reflexes and bodily processes.Footnote 4 Again, this suggests they belong in the realm of action.

Thirdly, habits and skills often develop as people go about pursuing goals (repeatedly). This implies that habit and skill formation is often closely intertwined with goal pursuit. Actions performed regularly become habitual and persist with little guidance from intentions (Gardner et al. 2011). This process, where habituation helps us to move away from guidance by intentions spells trouble for a strongly dichotomous picture. Research suggests that intentions are strong predictors of actions that are performed occasionally, but not for actions that are repeated regularly (Ouelette and Wood 1998). This transition from intentional to habitual is yet another reason to put them in the realm of action.

Fourthly and lastly, habits and skills are often understood as dispositions (Ryle 1949; McGeer 2019). Within the domain of dispositions, it is standard to distinguish between passive and active, or object-centered dispositions and agent-centered dispositions (Fara 2008; McGeer 2018; Vetter 2013; Vihvelin 2004). Ryle, however, understands habits as “mere dispositions”, distinguishing them from abilities (skills). Agent-centered dispositions differ from object-centered dispositions in that an agent has this disposition only because she has repeatedly manifested that behavior in the past: she has repeatedly A-ed in these circumstances or in this way. Ryle’s portrayal of habits as rote behaviors might appear as non-agentic. However, as McGeer points out, they are acquired and need to be repeated to remain in place. Both habits and skills (that an agent willingly develops by repeating the behavior upon a trigger until the behavior follows automatically) have agential quality through this path of learning.

Having briefly discussed four reasons to consider habits as actions I will now turn to ways to distinguish habits from skills.

2.2 Differentiating Habits and Skills

One problem with the category “habitual and skillful action” is the vast differences between the kinds of actions that it is meant to cover. These run “from our everyday commuting to the gold medalist’s world-class performance” (Bermúdez 2017, p. 896). Although habits and skills have always been central to Aristotelian strands of thought, contemporary discussions in cognitive-science-inspired philosophy usually refer to the work of Gilbert Ryle (1949: ch. 2). Ryle attacked what he called the “intellectualist legend” by observing that we consider the various things one does in the course of skilled activities to be manifestations of intelligence, without supposing that such activities are always guided by conscious thought or deliberation.

Ryle takes both habits and skills to be acquired dispositions and to this extent they are similar. He distinguishes habits and skills based on several characteristics. Habits, according to Ryle’s definition, are single-track dispositions, while skills are multi-track dispositions. In exercising a skill, one manifests a sensitivity to the circumstances, which may require one to do quite different things in order to achieve one’s objective on each occasion. Skill exercises are “indefinitely heterogeneous” (1949, p. 44) and require the agent to “exercise care, vigilance, or criticism” (1949, p. 42). In learning a skill the agents adopt a critical attitude (there is a crucial role for attention). Last, but certainly not least, Ryle argues that skills, unlike habits, require a specific kind of knowledge: know-how.Footnote 5

Ryle clearly distinguishes habits from skills, suggesting that only skillful actions are reason-responsive. “Drill dispenses with intelligence, training develops it” (1949, p. 43). Agents work at honing and developing skills even as they exercise them, thereby instantiating a dynamic form of ‘reasons-responsiveness’. As a consequence, people are responsible for what they do skillfully in a distinctive kind of way. For example, they show themselves “ready to detect and correct lapses, to repeat and improve upon successes, to profit from examples of others and so forth” (1949, p. 42).

Although I broadly concur with Ryle’s reasons for distinguishing habits and skills, I have reasons to take these distinctions as a matter of degree. A skilled agent might do something that she trained for, for example, with vigilance or without. It is unclear where this would fit in a model that strictly distinguishes habits from skills both in their practice and their exercise. There are reasons to consider habits and skills to be reasons-responsive, albeit in different degrees or by different means.Footnote 6 Many of the different characterizations that Ryle has put forward have been taken up for further analysis in the last few decades. Below I will briefly discuss the involvement of know-how, the role of attention (and vigilance), and the type of control that is involved.

Ryle’s division between habits and skills is very strict. This strictness overshadows an important distinction within the category of habits, which brings certain habits closer to skills than Ryle suggests. Habitual actions can stand in a relation to intentions: a habit can be intentionally acquired or changed over time. In that particular sense, a habitual movement can be an action because its “why” is preserved in this link with the past. The acquisition or the change of the habit was intentional (see McGeer 2019 and Douskos 2017 for a similar point). Based on this argument, I propose that the category of habit exists on a spectrum. On the one end, we find habits that are picked up by an agent without the agent being aware. Let us call these habitual movements.Footnote 7 On the other side of the spectrum there are habits that have an intentional history, that were put in place by the agent. These will be called habitual actions. These two ways of characterizing habits are not to be understood as a dichotomy, but as appearing on a spectrum.

Three main characteristics of skills and skilled agency will be discussed in the following subsections: the role of a specific kind of knowledge (know-how, 2.2.1); the role of attention (or vigilance, 2.2.2); and the role of different types of control (2.2.3). Following Fridland (2014), Christensen and colleagues (2015) and Mylopoulos and Pacherie (2019), I argue for the incompleteness of an understanding of skill through know-how. I explain that we must include attention and control to get a more complete account of skillful action. Particularly the combination of the role of attention, providing both bottom-up and top-down structuring of activity, and different levels of control, offers us more ground for understanding the differences between habitual and skillful actions. The results of Sect. 2 serve as the groundwork for Sect. 3, where I develop them in relation to joint action.

2.2.1 The Involvement of Know-How

The know-how debate is concerned with the kinds of knowledge you have when it is truly said of you that you know how to do something (ride a bicycle), and, in contrast, the kind of knowledge you have when you know that some fact is true (Paris is the capital of France). Two questions are central to the recent literature on know-how. The first concerns the degree to which knowledge-how can be reduced to, or is at its core, knowledge-that. If one accepts that know-how cannot be reduced to know-that a second question arises: What does knowledge-how consist in, if not in propositional knowledge? Answers to this question typically include that knowledge-how consists in having some sort of ability or disposition, partially understood through motor cognition and trained attention.

Fridland (2014) has carefully argued that an account of know-how as know-that is usually not sufficient to explain ability (to say that the agent has the ability to ϕ). To illustrate this, she discusses the case of an Olympian athlete and a trainer and how their knowledge can be different to show that knowing-how is not just a form of propositional knowledge.Footnote 8 She describes a gymnast who is able to do a standing layout on beam without being able to tell someone else how she does that. She has the know-how (and in an important way lacks some know-that). She might have learned a standing layout from her coach. The coach knows about how to perform a standing layout on beam. What he does not know is how to perform such a layout. The coach has the know-that but lacks the know-how. The case is further contrasted with that of a gymnast with a broken leg. This gymnast knows how to do a standing layout on beam, although she cannot perform one now. The difference between the coach and the gymnast with a broken leg – who both cannot do a standing layout – is that the coach does not know how to do the standing layout simpliciter, where the gymnast lacks the opportunity to instantiate her knowledge-how to do the standing layout on beam, but does have the ability (Fridland 2014). The gymnast will possess a lot of propositional knowledge (or knowledge that can be propositionally expressed), but her ability to perform the standing layout is not captured fully by such knowledge.Footnote 9 To conclude, even if the knowledge needed for skillful action contains important propositional components, it is not sufficient to explain skillful actions.

This brings us back to the second question I started this section with: how should we understand these abilities/dispositions? The role of attention and different types of control can help us understand abilities in more detail, which is why I will now turn to discuss them in some detail.

2.2.2 Involvement of Attention

It was long held that attention to the performance of a skillful action was harmful to the performance. This was supported by studies that show a deterioration of the performance when athletes were asked to focus on (an aspect of) what they were doing (Dreyfus 2007; Beilock 2010; Di Nucci 2013; Papineau 2013). Recently, however, these conclusions have been questioned (Christensen et al. 2016; Fridland 2014; Wu 2011). Whether attention is harmful to the performance depends on what that attention was focused on and who decided about this focus (the experimenter or the agent).

Wu (2011) argues that attention is an overlooked aspect of our agency. He suggests that attention plays a major role in action selection. Any situation offers an agent many possible actions. We can map the many inputs that we get and the many outputs that would be suitable to many inputs on what Wu terms the behavioral space. A behavior is the actually chosen path in the behavioral space. Which path to take on this map is a problem that agents have to solve. The choices made need not be intentional; they can also be automatic, involuntary, or unintentional. The role of habits, skills, attention, and intentions is to aid appropriate path selection in an agent’s behavior space. On the level of neural structures, they could be understood as biasing the mental processes, altering and strengthening specific routes, just like practice can.Footnote 10 What practiced agents are capable of, based on their input–output coupling profile, is (a) a better selection of controlled outputs and (b) more types of controlled behavioral outputs than are available to less capable agents.

The role that attention plays in skillful action is threefold: (1) it structures and coordinates multiple lower-order processes toward the completion of a represented goal; (2) it continuously sustains the goal’s representation (Bermúdez 2017; Christensen et al. 2015; Fridland 2014); and (3) it helps detect which potential situational circumstances are relevant for the current goal (Christensen et al. 2015). Loss of attention can cause the agent to wander from the current goal or intention. While the action evolves several other possible actions have to be inhibited or blocked, even if they are very strongly potentiated (Bermúdez 2017; Fridland 2014; Wu 2011, 2014). Attention works bottom-up and top-down. Top-down attention allows the agent to keep her goal in mind and not get distracted by other possible actions. Bottom-up attention presents the agent with behavioral options that were absent or not presented based on top-down attention. Bottom-up attention is shaped by the agent’s previous choices and behavior.

Two recent accounts on the role of control in skillful action reserve a major role for attention. Fridland (2014) integrates selective, top-down, automatic attention (Pylyshyn 2003; Wu 2014) into her account as a form of control. Selective, top-down, automatic attention is responsible for selecting the relevant features in an environmental array that a skilled agent should gather information about and respond to, given her goals, plans, and strategies. It improves with training (learning to control one’s attention) and is deployed automatically once the trained agent initiates intentional action. Although it is automatic, it is sensitive to the content of the goal states and strategies that the agent possesses at the level of strategic control.Footnote 11 Christensen and colleagues (2016) introduce situational awareness as an important element in their skillful action theory. They propose that attention goes to information that is of importance for a skillful performance in this specific situation (which is only possible if the agent has experience, i.e., is skilled). Relying on studies on naturalistic decision making (Simon 1955, 1983; Klein 1993, 1998), they argue that the awareness of one’s surroundings changes with the skills one has. This trained, or biased, attention gives agents an improved grip on their surroundings. Hence, although the focus within the debate on skillful action lies with top-down attention, bottom-up attention is also understood to be trained and gives agents a better grip on their situation. Top-down control is connected to goals and gives us a means to distinguish habits from skills. Habits rely on picking up signals from a surrounding and in that sense entail an alteration in our bottom-up processes. But they are not connected to the same selective top-down attention that skillful actions rely on.

2.2.3 Type of Control

When we think about a highly skilled performance it is very unlikely that we conclude that the skilled agent lacks control (Logan 1985, Fridland 2017). In fact, the very opposite seems to be true. The more expert one is at a skill, the more automatic that skill becomes and the more controlled it is as well. This might seem paradoxical, but only when we understand automatic and controlled to be opposites. Studies suggest that skill acquisition is associated with the formation of ever more elaborate, functional, and detailed, hierarchical structures in motor cognition that allow for control (Schack and Mechsner 2006). Rather than a loss of attention and control, attention shifts to different aspects of the activities the agent is performing (e.g., Christensen et al. 2016; Ericsson 2006; Fridland 2014; Papineau 2013). This allows the agent to respond to both expected and unpredictable environmental circumstances and to revise her strategy accordingly. The fact that this does not necessarily require the agent’s top-down attention gives us reasons to understand such skillful responses as automatic (in the sense that they are effortless and the agent might be unable to mention what visual cues led to her decisions).

Several proposals for a description of a hierarchical division of control functions suggest that it is conceptually helpful to distinguish between three levels of control (Pacherie 2008; Fridland 2014; Christensen et al. 2016; Pacherie and Mylopoulos 2019).Footnote 12 The threefold distinction is based on an analysis of the different and complementary functional roles of the types of control, of the different types of contents they involve, and of their respective temporal scales. The trichotomy below follows Christensen and colleagues’ proposal and wording.

  1. 1.

    Strategic control: Governance of an extended course of action so that it achieves one or more goals. Goals, plans, and strategies are used by the agent to guide various instantiations of motor skill.

  2. 2.

    Situation control: Determining what actions need to be performed in the immediate situation to achieve the overarching goal. This level of control helps select the relevant features in an environmental array.

Although Pacherie (2008), Fridland (2014), and Christensen et al. (2016) all distinguish this middle level, they do not all characterize it the exact same way. Fridland explicitly takes it to be mainly or completely automatic, but responsive to goals, plans, and strategies. Pacherie states that only strategic control (distal intentions in her terminology) ties to rational control, situation control consists mainly of tying this conceptual information to perceptual information about the current situation and memory information about one’s motor repertoire. Christensen and colleagues take situation control to be cognitive control and argue that it tends to automate only partly (less than the entire spectrum of associated characteristics of automaticity are involved).

  1. 3.

    Implementation control: Governance of the execution of the actions specified by situation control. This level of control runs automatically, and accounts for the exact, nuanced ways a skilled agent performs, modifies, adjusts, and guides her actions.

The main ideas are that agents control their actions through several control processes and that these processes function in unison (Christensen et al. 2016; Fridland 2014; Mylopoulos and Pacherie 2019; Pacherie 2008). The broadly hierarchical division of control-labor allows us to step away from the idea of full automation. It allows us to understand agents’ self-governance while acting skillfully because it leaves room for decisions made by the agent while acting skillfully.

It is not necessarily the case that all actions require the presence of the entire “control cascade”. In cases of skillful action, such as the athletes discussed in the papers by Christensen et al. and by Fridland, the three levels of control will work in close unison and are understood to stand in a top-down relationship. However, some decisions to act are made on the fly and do not involve a top-down relation between the different levels of control (Pacherie 2008; Christensen et al. 2016). The existence of spontaneous or routine actions suggests that it is not always necessary that the agent forms an intention or sets a goal to start acting. Accordingly, there are two things that I deem important and want to take from this discussion. Firstly, it makes good sense to distinguish three levels of control, which forces us away from a dichotomous picture of automatic and controlled. Secondly, skilled action, as it is understood in this paper, depends on a top-down relationship between these different levels of control. We need not, however, always assume this top-down relationship between these levels of control. An agent can become aware of what she is doing while a routine unfolds and decide whether or not she should continue with the action (Pacherie 2008; Baier 1997; Preston 2012). In those cases, the division of labor between the different levels of control can be different and the top-down structure need not be in place. I will get back to these kinds of continuings in the conclusion of the paper. If the agent decides to carry on, the action that had been initially triggered by a lower level of control is now also controlled by a higher level of control. The intelligence of the action, then, cannot solely be found in the propositionally structured mental states, but in the (balanced) combination of several control mechanisms. Often, the three levels of control and attention together allow the agent to govern their actions. This does not mean, however, that every type of control should be linked to intentions or awareness. But since the three levels typically work in unison, the agent will typically also have some level of awareness.

In this section I have sketched my understanding of habits and skills. I take an action to be habitual when the reason why an agent acts is to be understood mainly as a situation triggering a certain activity or goal (bottom-up attention) time and again. An intentional action can become a habitual action over time, and the reason why an agent acted as she did can be understood through this history (see also McGeer 2019, Douskos forthcoming). Skill tells us something about flexibility, intelligence, and automaticity while aiming at a certain goal. This involves top-down attention and multiple levels of control that are highly integrated. Because skillful activity relies on the presence of a goal, it will less likely go against other plans and intentions they might have. A token of a specific action type can rely on different levels of control depending on the context. The agent could be driving to work skillfully as well as habitually.Footnote 13 The weather, the other people on the road, and whether or not she is in a hurry can all influence the amount of top-down attention that is involved. An agent might get distracted and (temporarily) lose top-down control while continuing to drive. With the distinction between habits and skills in place and an account of three levels of control spelled out, it is time to put these distinctions to use in the debate on joint action.

3 Bridging the Gap between Mere Bodily Movement and Action in Joint Action

Debates in the domain of joint action and collective intentionality are focused on several key questions. One key question is “how do agents coordinate their actions?” A second question is “which of these coordinated efforts can be seen as joint actions?” Most work regarding these questions falls into one of two traditions. First, there are attempts to capture, at the personal level, the nature of the propositional attitudes possessed by the agents in joint action. These attempts focus on the content, the mode, or the subject of such attitudes. Second, there are attempts to capture, largely at the sub-personal level, the cognitive and neural mechanisms implicated in a joint action. The distinction between sub-personal (emergent) coordination and personal (planned) coordination is highly related to, or dependent on, the features that define the automatic/controlled distinction. Some scholars deem the sub-personal mechanisms of emergent coordination irrelevant because they do not give us an account of joint action, but rather of joint mere moving. Developing the categories of habitual and skillful joint action allows us to characterize joint actions that are not full-blown intentional as actions. Consequently, this helps us overcome the dichotomy in a similar way as in the individual case.

Two important overview articles spell out dichotomous accounts of joint action based on findings in cognitive science and developmental psychology. They divide the ways agents coordinate between planned coordination and emergent coordination (Knoblich et al. 2011), or we-intentions and alignment systems, i.e., lower-level coordinative structures (Marsh et al. 2009; Richardson et al. 2007a, b; Schmidt and Richardson 2008; Tollefsen and Dale 2012). Planned coordination and we-intentions are understood propositionally. Emergent coordination and alignment systems are understood non-propositionally.

A first reason to give up this binary distinction is that emergent coordination is too heterogeneous a category to function as one side of a dichotomy. Some of the functions that allow for emergent coordination have many features that are typically associated with being highly automatic (no ability to stop the activity, no awareness of the activity or coordinating effects, going against one’s current goals and intentions). Others can mainly be characterized as relatively non-automatic (there is goal-directedness, awareness, an ability to stop the current activity, etc.). Some of the functions, such as alignment and entrainment, are hardly ever (if at all) noticed by the agents. Not only do these functions and their structuring effects go unnoticed, but the agent will also often be incapable of stopping them or of noticing their effect, even when she focuses on them. Other aspects that usually are understood as part and parcel of emergent coordination, such as affordances, might be picked up by the agents, or even generated willfully by one agent so that another can respond to them (see Martens 2020 for a discussion). By setting emergent coordination apart from planned coordination we easily end up clustering several characteristics, such as automatic, non-agentic, and absence of control. That is, we end up with the same – problematic – clustering outlined at the beginning of this paper. Some of these emergent coordination functions contribute to joint action because the agent intends them to, and others sometimes despite the agent not intending them to.

Rather than discussing these dichotomous approaches in further detail, the rest of this section briefly discusses the minimal architecture model that tries to fill the gap between “highly-automatic” and “highly-controlled” coordination (Sect. 3.1). This model has recently been used as part of a model on joint know-how which I will discuss in Sect. 3.2. I argue that these models rely on top-down organizing and in Sect. 3.3 I introduce the role of attention and control to develop an integrated understanding of habitual and skillful joint action.

3.1 Minimal Architecture Model of Joint Action

The minimal architecture model (Vesper et al. 2010) was introduced to fill the gap between emergent coordinationFootnote 14 and planned coordination that draws on propositional attitude ascription. The model consists of four building blocks, or modules, that together fill the gap. About these modules the authors write:

Unlike the dynamical systems framework that considers interpersonal coordination as a special case of more general coordination principles, the proposed framework assumes the existence of dedicated mechanisms for joint action. Unlike approaches focusing on language and shared intentionality that are mainly concerned with thinking and communicating about acting together, the framework is geared towards explaining how people actually perform actions together. (Vesper et al. 2010, p. 998)

In terms of intentions, they are interested in intention in action (or proximal intentions) instead of prior intentions, and in the lower levels of control rather than strategic control. Vesper et al. propose four building blocks (modules) that, together, allow agents to act jointly. (1) Goal Representation: both agents represent the goal, their own task, and the task of the other agent. (2) Monitoring: both agents monitor whether the goal and task(s) unfold as expected. (3) Prediction of the unfolding of the movements: needed in order to monitor. (4) To further facilitate joint action a coordination smoother is added to the model. Such facilitation happens, for example, via modulation of one’s own behavior, or by using an object that affords a particular task distribution.

Goal representation, monitoring, and prediction also play an important role in individual agency and are, in that sense, not joint-action-specific. The principles on which they rely are more general motor cognition principles. The first three building blocks are joint-action-specific insofar as they evolve around a joint or shared goal, the tasks of the other agent, and the other agent. The representations, monitoring, and prediction while coordinating need to be more elaborate, as they also include a focus on joint goals and the tasks of the other agent. If we look at the levels of control in skillful action that I discussed in Sect. 2.2.3, these three building blocks certainly play a role in the middle category of situation control, and potentially also in the lower-level category of implementation control, which would also fit with describing them as proximal intentions.

The fourth and most joint-action-specific building block in this model is the coordination smoother. Coordination smoothing can consist of the modulation of one’s own behavior to simplify coordination and in the use of objects that afford a particular task-distribution that has a smoothing effect. Coordination smoothing helps to make behaviors more predictable. Examples include making movements less variable, delimiting and structuring one’s own task such that the need for coordination is reduced, imposing structure on a task (e.g., turn-taking), coordination signals, making certain movements salient, and object usage. I think that such modulation of behavior is indeed of great importance for successful coordination. Within a model that spells out building blocks that stand in relation to cognitive functions, however, it seems to me that the sheer multitude of cognitive functions assigned to this fourth building block is problematic. What kind of coordination smoothing is called for will depend on the type of action, the situation in which this action is performed, whom one is interacting with, etc. What cognitive functions are needed for this smoothing to occur and how this can be understood as one building block remains underdefined.

The four building blocks do not stand on their own. The minimal architecture model assumes that the agents already have a shared goal, which provides them with the tasks which then need monitoring and prediction for fluid execution. How the agents come to share this goal is not explained by the model. By assuming such goals to be present, the model seems to rely on a top-down structuring of control. Therefore, I take the model is geared towards explaining how people actually perform actions together, but the explanation is partial: the model needs to be understood as a part of a larger whole that gives us the goals. Or the model needs to be extended to also explain how coordination can happen without a presupposed goal. To that extent the model might be understood as a model of agents skillfully coordinating. As the model assumes there is a shared goal, this shared goal also explains why (a) we can speak of action, and (b) why we can speak of joint action. What the model does not give us is an understanding of coordination that starts bottom-up, or habitual joint coordination, and an answer to whether – and on what grounds – we can understand such coordination as joint action.

3.2 Active Mutual Enablement: Joint Know-How

Recently, some articles on know-how and groups have been published (Birch 2018; Miller 2020; Palermos and Tollefsen 2019). I will focus on Birch’s account as it is fairly explicit about the cognitive processes it introduces and makes connections to the minimal architecture model just discussed. Birch argues in favor of group know-how as a form of know-that (Birch 2018).Footnote 15

Birch’s (2018) “Active Mutual Enablement” account of joint know-how argues that two agents that do something skillfully together have two types of know-how: they know how to do their part and they know how to coordinate. He defines know-how as a personal-level state that arises from sub-personal mechanisms of skillful action control. The puzzle, for Birch, is to better understand what “coordination know-how” is. Birch makes use of the minimal architecture model and combines this with a focus on diachronicity. Together they form the core of his theory of Active Mutual Enablement, or joint know-how. He hypothesizes that the second type of know-how, knowing how to coordinate, consists of each person having four different pieces of know-how simultaneously:

  1. 1.

    Knowing how to monitor the behavior of the other person, looking for signals and cues (signs that things are going well or starting to go wrong).

  2. 2.

    Knowing how to predict the behavior of the other person from the signals and cues they observe.

  3. 3.

    Knowing how to adjust one’s own behavior in light of what is predicted, in such a way as to mitigate the risk of failure of the shared enterprise.

  4. 4.

    Knowing how to do one’s own individual part in a way that actively enables the other person to do the above three things. In other words, each person knows how to help the other person predict the emerging problems and risks that might call for adjustment.

Birch’s model, with the four modules, mirrors the distinctions made in the minimal architecture model discussed in the previous section, although he emphasizes the diachronicity of his approach. Mutual enabling occurs over time.

Although Birch defines know-how as a personal-level state he simultaneously talks about the sub-personal mechanisms as know-how. I take this to mean that personal-level know-how states arise from sub-personal mechanisms of monitoring, predicting, and adjusting. These mechanisms themselves would therefore better not be called know-how mechanisms (a) to avoid confusion and (b) also because some of their functionality never translates to know-how and is inaccessible to the agent. This follows because according to Birch the know-how arises from sub-personal mechanisms of skillful action control and therefore should not be equated to these sub-personal mechanisms. Part of the monitoring, predicting, and adjusting will remain unknown to the agent. What this means concerning the question of joint action, as contrasted with mere coordination, is unclear. Again, as with the minimal architecture model, the examples that Birch works with are all about agents that have a clear shared goal (in his case rowing together), leading once more to a top-down structured effort to coordinate.

Is it useful to try to understand “coordination” know-how from “action” know-how? Especially while grounding coordination know-how in motor cognition processes? The coordination know-how listed in points 1, 2, and 3 relies on the same motor cognition processes as the monitoring, predicting, and adjusting of the individual agent’s actions, so why does it constitute a different kind of know-how? Potentially the know-how listed in point 4 gives us reason to talk about another type of know-how. However, based on similar arguments as I put forward while discussing the coordination smoother, it seems problematic to understand active enablement as a single cognitive module. Again, depending on the task the kind of knowledge that will need to be feed into the mechanism will vary wildly. So why assume it is one type of know-how-mechanism and how can it be differentiated from the process of, for example, knowing how to adjust one’s actions (building block 3)? What we have to monitor and what counts as a signal or cue will depend on the action we are involved in, the role we have, and the specifics of the situation. Looking at the different levels of control and the role of attention seem to be a more promising route once more.

3.3 Skillful Coordination, Habitual Coordination

The three levels of control that were introduced in Sect. 2.2.3together allow an agent to act skillfully. In skillful action the structuring is top-down, the agent has a goal,Footnote 16 intention, or plan. Although there is a need to distinguish between the skills of a professional gymnast and the skills I use to get through many of my daily activities, I still take it that many intentional actions are skillful intentional actions. By this I mean to say that the integrated workings of the three levels of control will be depended upon. The analysis of skillful action points to the importance of the lower levels of control for agents to act, which translated to the domain of joint action.

The joint action accounts that have been discussed in Sects. 3.1 and 3.2 each in their own way implicitly assume the relevance of this integration. The models work with situations in which the individuals know the goal they are working towards together. The minimal architecture model assumes a goal and tasks to be present, which are then represented, monitored, and controlled. Likewise, the active mutual enablement theory describes the coordination of agents within a wider framework that already specifies the goal of the action.

As I pointed out earlier, some decisions to act are made on the fly and do not warrant a top-down relation between the different levels of control. Although there is a top-down relationship between the levels of control in the case of skillful action, we need not, however, always assume this top-down relationship between these levels of control. This is where a return to the earlier discussion on bottom-up attention is relevant. Attention works bottom-up and top-down. Top-down attention allows the agent to keep her goal in mind and not get distracted by other possible actions. Attention structured bottom-up will present the agent with behavioral options that were absent or not present based on top-down attention. Bottom-up attention is shaped, or biased, by previous choices and behavior. An agent can have an agentic history to this shaping, setting things up to set it off. Secondly, an agent can become aware of what she is doing while a routine unfolds and decide whether or not she should continue with the action (Baier 1997; Pacherie 2008; Preston 2012). In such cases there is no top-down structuring, or at least not from the beginning. The intelligence of the action, then, cannot solely be found in the propositionally structured mental states, but in the (balanced) combination of several control mechanisms and the earlier training and/or drilling of the agent.

Some joint actions will depend more on one level of control, others on the strong integration of all three. Such dependencies might also differ over time within a single collective enterprise, unfolding in multiple ways, while the agents interact. In such an interaction-dominant system, behavior is an emergent outcome that is “the result of interactions between system components, agents, and situational factors with these intercomponent or interagent interactions altering the dynamics of the component elements, situational factors, and agents themselves” (Richardson et al. 2014, p. 256). This is different from the approaches we have seen so far in the collective intentionality debate, in that it does not try to understand any type of action from only one type of control or coordination. It leaves space for the integration of the functions that were distinguished and kept separated within the minimal architecture model of joint action and the dynamic system model of know-how.

Situations and the framing they offer can structure coordination in a bottom-up way. Most of our everyday interactions are within a setting that biases us into doing things in particular ways. This goes from morning routines, to getting a cup of coffee in a coffee shop, to the ways we interact on the road and at work. Many such structures exist only through social constructions that are put in place by others. Such structures and their relation to our agency are, amongst others, discussed in work on habitus and fields (Bourdieu 2000; Mauss 1979), joint affordances (Costall 1995; Richardson et al. 2007a, b; Rietveld and Kiverstein 2014), and work on a sense of we-agency, a sense of us (Martens 2018; Pacherie 2014; Schmid 2014; Zahavi 2015) and a sense of joint commitment (Michael et al. 2016). We share environments and practices, and time spent together further secures such practices, habits, and skills, giving us the backdrop to act on these practices and structures.

4 Integrating Habits and Skills in the Domain of Joint Action

This paper started by pointing to a problematic and outdated dichotomy that holds a firm grip on the philosophy of joint action. I discussed means required to develop a more differentiated picture of the middle ground between sheer automaticity and full-blown intentional action. Then I continued by evaluating two theories of joint action that have tried to fill this gap and related them to research on (individual) skillful action. What remains to be done is to connect this to the discussion on skillful and habitual joint action and show that ‘filling the gap’ repeated the problems I indicated in the above-mentioned theories of joint action. I suggested that an interaction-dominant view is needed to overcome these problems. Two points play a key role in this interaction-dominant view. (1) Both top-down and bottom-up attention play a key role in the understanding of skillful coordinated action. Habitual action and habitual movement are more dependent on bottom-up attention. They typically do not depend on the presence of a goal. (2) The involvement of the different levels of control will vary based on the type of activity. Actions that stretch over time (e.g. playing sports, walking together) might depend more heavily on different types of control throughout their exercise. Something the agent typically does habitually can also be done skillfully.

To describe group action as skillful is to say something about the way (how) in which the group acts. The individuals that make up the group have to know about the goal of the group. Coordination, however, can also arise without the presence of a goal and through our habituation (understood as biased attention, following Wu’s ideas as discussed in Sect. 2.2.1). A bottom-up structuring, employing implementation control and situation control, is another route to coordination. To describe a group action as habitual implies something about the explanatory reason (why) the group acts.

Particular situational structures will produce specific habits and skills because the agent will have to deal with the structures of the environment. In a social setting habits and skills arise because we need to coordinate, while also shaping our coordination. This “looping” or “interdependence” between social conditions and individual dispositions is crucial to an understanding of habits and skills (Dewey 1922; Bourdieu 2010; Haslanger 2018; Hufendiek forthcoming).

An agent can find herself in the middle of a bodily movement and continue, or adjust, what she is doing, rather than starting the action intentionally. That is, some intentional doings might be better understood as intentional continuings (Baier 1997).Footnote 17 This insight can be carried over to the case of joint actions, where agents can pick up on the movements, actions, and plans of others and participate in them (continuing rather than starting the joint action). Take the example provided by Michael et al. (2016) in which an agent is cleaning up and picks up a ball. Her dog responds enthusiastically and indicates that it is ready to play fetch. This unintended generation of an expectation is now picked up by the agent and continued intentionally. In this case there is also an activity that gets interrupted, but this need not always be the case. Voluntary shared activities may turn out to be parasitic on non-voluntary ones and planned shared activities parasitic on naturally coordinated ones. Two agents can function as triggers for, or be the social situation that calls for, further behaviors, leading to coordination.

In Sect. 2.2 I presented the idea of a spectrum ranging from habitual movements to habitual actions. This distinction was made based on the way in which the why-question regarding action-explanation is answered. When it comes to understanding joint action, this distinction implies that the way in which groups, and individuals in a group, acquire habits is of importance for our understanding of whether the group’s behaviors constitute actions or not.

On the group level we can now make sense of more types of joint actions that are not merely cases of coordination:

  1. 1.

    Groups that settle on a goal in advance, then go on to act on it (skillfully).

  2. 2.

    Groups that once settled on a goal and now do things habitually. The provided analysis gives us a framework to understand these habitual actions as actions.

  3. 3.

    Groups that once settled on a goal and now do things habitually, that they wouldn’t do in the same way if they would reflect on it. Within the group the habitual action has moved in the direction of a habitual movement.

  4. 4.

    A new generation developing habits based on decisions by the earlier generation. They develop them as a habitual movement, rather than a habitual action.

  5. 5.

    Collectives of individuals that have been socialized in the same way, allowing for coordination without settling on a goal. The aggregated agents perceive the same action options, allowing for coordination without settling on a goal together. They might knowingly or unknowingly rely on this socially shared pool of habits.

In the latter case there can also be habitual movement, where the agents do not stand in the right relation to the development of the habit. Coordination can start from one of these types of joint action, then moving to another form in a similar vein as Baier’s continuings.

With the considerations expressed in this article, I hope to have shown convincingly that recent developments in skillful action theory necessitate a rethinking of ossified basic conceptualizations in the domain of joint action theory. The classification in this last section allows for a richer understanding of the types of joint actions that agents engage in. It allows us to include more cases of coordination as cases of joint action, while also allowing us to distinguish between different joint action phenomena. These differences depend on inter-related types of coordination and top-down and bottom-up attention. Together, attention and control allow us to understand the dynamic development from one type of coordination to another. Over time a coordinated effort could change from, e.g., type five coordination to a situation where the agents are more aware of the coordination and decide to continue with said coordinated actions as a goal. More work is needed to rework our understanding of joint action taking into account new literature on attention and control, but I hope I have provided a promising starting point.