1 Introduction

Concepts are considered by psychologists, philosophers, and cognitive scientists to be central building blocks for thought and cognition more generally. I presuppose here what is arguably the mainstream view about concepts, namely that they are mental representations of categories and associated bodies of knowledge or information (see e.g., Machery 2009). Conceptual representations have semantic content, as they refer to some category in the world, and cognitive content, which consists of cognitively or psychologically significant information used for mental processing.

Many open questions surround the notion of concepts, like to what extent they are inborn, how it is possible that they can refer at all, and whether they have stable cores that are instantiated with each tokening. A central and active debate in concept research concerns their format. The question is whether conceptual representations are amodal or modality-specific (short: modal). Modalism has traditionally been the common-sense view and is rooted in empiricism. Amodalism has then been a recent dominant view, in connection with the surge of the computational view of the mind, and especially with Fodor’s (e.g., 1975) “Language of Thought” (LOT) thesis. Recently, modalist positions have also resurged strongly (some call it “neo-empiricism”Footnote 1) in the context of the embodied cognition paradigm, which stresses that our conceptual apparatus is being shaped by the constraints of our body and sensory apparatus (e.g., Clark 1998; Lakoff and Johnson 1999).

Recent modal views characterize conceptual representations as states corresponding to “re-enactments” or “simulations” of sensory or motor states involving the sensorimotor areas of the brain. To token the concept DOORKNOB is to token a mental representation similar to those mental representations tokened when a doorknob is perceived. In the case of motor states, representations are in an “off-line simulation” mode, i.e., they do not lead to the final execution of motor commands. For instance, to token an action verb involves the activation of parts of the motor brain area, suppressing efferent signals to muscles. Barsalou’s “grounded cognition” has as a central tenet the modality of conceptual representations:

[...] a diverse collection of simulation mechanisms, sharing a common representational system, supports the spectrum of cognitive activities. The presence of simulation mechanisms across different cognitive processes suggests that simulation provides a core form of computation in the brain (2008:619).

It is essential to point out that although the modalist view might have its roots in empiricism, it differs from traditional empiricism in some crucial aspects. Firstly, the modalist need not necessarily reject nativism (e.g., Barsalou 2008:620, 2016:1123); the questions of concept format and nativism are orthogonal. Secondly, modal representations should not be confused with literal conscious mental images. Also, modalists have moved away from extreme simulation views and now allow for schematic, unconscious representations, as well as representations where various modalities are “convolved” (see Section 4.2) into multimodal representations. Thirdly, modalists (e.g., Barsalou 2008:620, 2016) do not necessarily need to deny that in addition to modality-specific representations there are also representations that are not “grounded” in external experience, i.e., some positions are a hybrid, though biased towards modalism:

From the perspective of grounded cognition, it is unlikely that the brain contains amodal symbols; if it does, they work together with modal representations to create cognition. (Barsalou 2008:618).

Amodal representational systems (e.g., Fodor 1975; Pylyshyn 1984), in turn, are not associated with a specific modality. They are formal, language-like and “abstract” and their symbols are processed syntactically, i.e., in virtue of some formal aspects (not meaning or content). Such representational systems work roughly like a formal calculus of symbols, much like a natural or formal language consisting of syntax/grammar and word forms. The motivation for amodalism is based on two key observations. Firstly, it is generally recognized that conceptual representations must be able to account for systematicity and productivity of thought. This requires amodal symbols that can be freely recombined to form novel concepts or propositions. Secondly, the existence of abstract concepts, like DEMOCRACY or TRUTH, purportedly requires amodal representations. For amodalists, it seems to be a contradiction in terms to have abstract concepts grounded in perceptual or motor representations.

While it seems intuitively clear what the two positions distinguish along the above-sketched lines, it turns out to be difficult to further characterize the difference between the representational formats. Authors define modality versus amodality in different ways, and none of the proposals available seems to survive more in-depth scrutiny (see Haimovici 2018). A second fundamental problem concerns the available empirically support. From the debate it becomes apparent that the same evidence can be interpreted in ways that are compatible with each view, and it is still unsettled as to which view provides a better explanation of the phenomena.

In this paper, I argue that in the face of those problems, we should be suspicious about the usefulness of the modal/amodal dichotomy. I suggest that we should overcome and reconceptualize it as a graded notion. For that purpose, first, I expand on the two fundamental problems that the dichotomy faces: the difficulty of fleshing out the distinction in precise and agreed on terms (Section 2) and the problem of what evidence would count as support for the different views (Section 3). In Section 4, I deny that abstract concepts are more of a problem for modalists than for amodalists. I then review and reject recent hybrid approaches as an alternative (Section 5). In Section 6, I illustrate the reconceptualized notion of a graded distinction between modal and amodal formats using the example of a cognitive-computational model of concepts within the Predictive Processing framework, a relatively recent, but already well-established cognitive paradigm. In Section 7, I conclude.

2 The problem of telling apart modal and amodal representations

The first fundamental problem concerns the very distinction between modal and amodal formats. Though one might have some intuitive grasp of such a distinction, there is no generally accepted criterion to tell apart modal from amodal representations. The point is not that there is no agreed upon conceptual analysis available for the notions of “modal” and “amodal”, which would be too demanding, but that there is not even a rough criterion or a working definition that most authors share. In the absence of such common ground, one might worry that maybe the whole format debate is ill-conceived. Now, there are no generally agreed criteria for a distinction, but there are, of course, different working definitions that are used by various authors. The problem is that all of the characterizations suggested have issues (see Haimovici 2018, for a more detailed discussion) and there is no suitable candidate to converge on.

Fodor, for instance, suggested a mereological criterion based on a distinction between icons and symbols. Every part of an iconic representation represents a part of the content, whereas this is not the case for symbolic representations. Similarly, some authors (e.g., Mahon and Hickok 2016) appeal to the fact that amodal symbols are arbitrarily related to their content, while modality-specific representations have some isomorphic aspects between content and their vehicle. However, both approaches have a similar problem. The criterion might work well for visual-spatial representations, but it is far from clear how to generalize it to the many other sensory modes, e.g., to olfactory, auditory, proprioceptive, or interoceptive representations. For instance, the “parts” of (the projection of) a scene could stand in a one-to-one relation mirroring the retinal pixel arrangement, or some other neural activation patterns in some higher-level brain areas. This works well as the images and the cell arrangement are both extended in space. However, how would this work for the other modes in which spatial extension is not essential?Footnote 2

Another related distinction between modal and amodal representations is based on analogical and digital formats. In an analogical representation, some property of the representational vehicle co-varies continuously with what is represented. It is indeed plausible that many modal concepts can be placed in some “quality space”, e.g., a colour space. For instance, the concept RED could be represented by some (convex) region in a three-dimensional “colour space” (e.g., Gärdenfors 2014). However, such a space can be perfectly digitally encoded. To place RED into an “analogical space” is a higher-level interpretation of some other underlying lower level representational format. Machery has also argued against the usefulness of characterizing modal symbols as analogical and amodal symbols as digital in nature (e.g., 2007:23) based on evidence that some amodal representations are analogue (e.g., representations of numerosity), and that there are visual representations that are not analogical.

Machery then suggests applying another criterion:

This does not mean, however, that we cannot distinguish between perceptual and amodal representations. Following Prinz (2002, p. 113), one can propose that perceptual representations are whatever representations psychologists of perception say perception involves (2007:23).

This expert criterion might be perfectly valid, even if it does not allow for a full and detailed further conceptual analysis (e.g., of necessary and sufficient conditions). However, even if no precise conceptual analysis can be provided, it certainly would be surprising if nothing further could be said by an expert to justify the distinction. Moreover, it might be the job of a philosopher to help to make the (possibly implicit) criteria more explicit. A merely deferential criterion of the modal/amodal distinction is therefore not very satisfying and should be only a last resort solution.

We could turn to a neuroscientist instead of a psychologist of perception and apply a neural location criterion that takes into account neurophysiology. Concept representations are, after all, implemented in the brain, so the strategy is to analyse the neural activation patterns and identify their location. A representation is then classified as being modality-specific if during a semantic task the activation of neuron assemblies occurs in areas that are considered by neuroscientists to be sensorimotor processing areas. The fact that the presentation of, say, a cat activates neuron assemblies in, e.g. the primary visual cortex is taken as evidence for modality-specific representations. A lot of empirical support appealed to by modalists and amodalists presupposes this criterion. However, this proposal also turns out to be problematic, as we will see later in more detail. Most of the discussion of this paper will assume a neurophysiological criterion, as I am concerned with an account of concepts empirically informed by the neurosciences.

Another, related, approach to characterizing amodal versus modality-specific representations is based on the sort of input that a representation receives. Authors like Prinz (2002), Dove (2009) and Dehaene (2011) suggest that amodal representations respond to different types of sensory modalities, not just one. Those authors appeal, for example, to number concepts. For instance, we can classify three things independently whether they are three objects, sounds, or actions. However, this account could be accused of conflating amodal and multimodal representations (see also Haimovici 2018:3, for the same point) and would, therefore, not clearly distinguish modalism and amodalism. I will say more about representational abstraction and abstract concepts in a moment. Finally, Barsalou has proposed an “independent systems criterion”, which could be seen as a specific proposal for a neurophysiological criterion: “...cognition is computation on amodal symbols in a modular system, separate of the brain’s modal systems for perception, action, and introspection” (Barsalou 2008:617). “Independence” could be functional or anatomic. However, as the later discussion will show, such a strict dichotomic separation is implausible.

With this quick and condensed review, which does not pretend to be an exhaustive evaluation, I want to make the point that we are not short of proposals to tell modal and amodal formats apart. But all of the proposals have issues and there is no consensus as to what the appropriate one is.

3 The problem of evidence for modal and amodal representations

The second fundamental problem for the amodal/modal distinction concerns the empirical support for either position. In this section, I argue that the empirical evidence used in the modal/amodal debate is not conclusive (see also the review of Dove 2016:1110–1111). What we can conclude safely from the evidence, however, is that extreme modal or amodal positions are not tenable, and, indeed, both modalists and amodalists increasingly move to hybrid accounts. The question then remains as to whether some available hybrid account provides a suitable model for conceptual format. I can’t review here, exhaustively, the vast body of empirical results, so instead I will focus on the big picture and some representative examples to make my point. For more detailed reviews I refer to the literature (e.g., Barsalou 2016; Dove 2016; Kemmerer 2019; Machery 2016; Meteyard et al. 2012).

To start with, let me differentiate further between the various positions in the modal/amodal debate. Meteyard et al. (2012) usefully introduced a taxonomy of the views located on a continuum from “strongly embodied” to “completely unembodied”. Completely unembodied (fully symbolic) views (e.g., Mahon 2015; Mahon and Caramazza 2008) hold that concepts are amodal representations and modal information does not play any relevant role in conceptual representation, i.e., semantic content is independent of sensorimotor areas. Strongly embodied (full simulation) views reduce conceptual processing to the level of sensorimotor (modal) representations (e.g., Gallese and Lakoff 2005; Glenberg and Gallese 2012). A consensus seems to emerge that extreme views have little empirical support and a compromise is needed (e.g., Borghi et al. 2017; Chatterjee 2010; Dove 2016; Meteyard et al. 2012; Reilly et al. 2016). To see this, let us briefly review three examples of empirical strategies that have been deployed to reveal the nature of conceptual format. I suggest that the evidence does not adjudicate the debate. However, we can conclude that: a) sensorimotor representations play a pervasive role in conceptual processing (though the question of whether they are a constitutive part of the conceptual representation remains open), and b) some form of abstracted representations is needed (though the question remains as to whether those abstracted representations are amodal, or count as modal).

Among the most popular empirical strategies employed is the identification of activation patterns in sensorimotor areas during conceptual processing using neuroimaging techniques like fMRI. Many studies (e.g., Hauk et al. 2004; Chao and Martin 2000; Simmons et al. 2005) have demonstrated the relevance of sensorimotor activity when concepts are processed. However, while this happens in many instances, there are exceptions. As an example, it turns out that on some occasions processing of an action verb does not activate action areas in the brain (e.g., Barsalou 2016; Dove 2016; Kemmerer 2015). Also, Pecher (2018) recently showed that motor representations are not activated automatically; hence their activation is not always necessary for conceptual processing. This suggests that sensorimotor areas are often, but not always involved when concepts are tokened. While this most likely excludes the extreme grounded (modal) view, we still cannot distinguish whether the co-activated representations are part of the concept, or consequence of “spreading activation” (e.g., Mahon 2015:420). Leshinskaya and Caramazza (2016) suggest that tight coupling or coactivation of conceptual and sensorimotor representations is evidence for the interaction of conceptual and sensorimotor representations, but not for concepts being modal. A fundamental difficulty in deciding the debate by this route resides in the complexity of establishing in a principled way how fast or far spreading can be so that the firing neurons still count as a constitutive part of the same representation. A related strategy, also based on neuroimaging, is to establish whether different modality-specific cues related to a concept activate a common representational core in regions that can be considered not to be modality-specific (see Barsalou 2016; Fairhall and Caramazza 2013; Van Doren et al. 2010). However, evidence for shared cores seems consistent with both weak modalism and amodalism. Weak modalists can account for this phenomenon by claiming that the core is multimodal and abstracted (i.e., it still contains - compressed - modal information).

Scientists have also turned to a strategy based on detecting a causal role of the two types of representation via neurophysiological lesion studies (e.g., of patients with semantic dementia) or Transcranial Magnetic Stimulation (TMS) experiments. The idea is to explore whether modal or amodal representations are necessary for semantic comprehension. If, for instance, the motor-area is permanently or temporarily impaired but the understanding of action words remains intact, then it seems that sensorimotor areas are not necessary for concept representation (and strong modalism must be false). For instance, Repetto et al. (2013) showed that the stimulation of the hand portion of the primary motor cortex leads to slower reaction times for hand-action verbs, indicating that sensorimotor areas play a causal role in verb comprehension. Similarly, Gerfo et al. (2008) showed that repetitive TMS (rTMS) stimulation of the left motor cortex delays the processing of action verbs and names. However, Vannuscorps et al. (2016) document the case of a patient with increasing atrophy of sensorimotor regions (leading to an increasing action production disorder), but persistent intact performance with action-concepts. This shows that motor-representations are not necessary for all conceptual tasks. Pobric et al. (2010) showed - with a reverse strategy - that rTMS on the temporal poles leads to reduced efficiency in semantic tasks but does not have an impact on perceptual tasks. The authors conclude that this is evidence that the poles play a role as amodal processing sites. However, all this evidence is not a problem for weak modalists. They only need to admit that low-level sensorimotor representations do not need to be activated in all cases, as full simulation modalists would claim. The weak modalist only needs an account that includes abstracted modal (or multimodal) representations.

As a final example of an empirical strategy, take the appeal to behavioural evidence. Recently, Fischer and Shaki (2018) have studied the performance signature for number concept processing. The results support the claim that the processing of paradigm examples of abstract (and hence purportedly amodal) concepts shows clear characteristics of perceptual processes. The authors have identified a range of effects that are typical for perceptual discrimination and that are preserved when numbers are processed in symbolic form: for instance, distance effects (e.g., 3 and 9 are easier to distinguish than 3 and 4), size effects (e.g., 3 and 4 easier to distinguish than 8 and 9) and spatial-numerical associations (numbers seem to be located on a spatial number line) revealed by motor-behaviour. This seems to be evidence for modalism. But amodalists can recognize the importance of modal representations in higher cognition and argue that conceptual processing sometimes uses perceptual heuristics, while number concepts remain amodal representations (see, e.g., the “Offloading” account in Section 5).

The last example involved abstract concepts (numbers). So far, we have not explicitly distinguished between concrete and abstract concepts. That distinction, however, plays a central role in the debate. While it seems quite plausible that concrete concepts could somehow be represented modally, amodalists have been concerned that modalism is incompatible with abstract concepts on both empirical and theoretical grounds. Maybe abstract concepts are then the Achilles heel of modalism that tips the balance towards amodalism. However, I will argue now that abstract concepts are not more of a challenge for modalism than they are for amodalism. Therefore, the impasse remains intact.

4 Do abstract concepts support amodalism?

The existence (and pervasiveness) of abstract concepts has been one of the principal arguments against modalism (see, e.g., Dove 2016; or Löhr 2018, for discussions). Prominent examples in the literature are, for instance, number representations (e.g., Dehaene 2011; Fischer and Shaki 2018; Machery 2007:34), and concepts like DEMOCRACY and TRUTH (e.g., Dove 2009, 2016; Löhr 2018). Dove (2016) has summarized some of the main challenges purportedly posed by abstract concepts to modalismFootnote 3: a) generalization, b) flexibility and c) disembodiment. Let us unpack those briefly (4.1) and then see how a modalist can respond (4.2).

4.1 Dove’s challenges from abstract concepts for the modalist

Dove thinks that the “generalization” involved in abstract concepts is a challenge for modalism. Generalization has a horizontal dimension, which consists of the extension of a concept with new exemplars, and a vertical one, which corresponds to an organization in terms of super- and sub-ordinated concepts. According to Dove, the claim that concepts are structured in hierarchies of abstraction is supported by evidence such as cross-modal deficits or hierarchical degradation of conceptual knowledge as well as evidence of the existence of areas that are not modality-specific (2016:1112), i.e., show an “abstracted” behaviour. With regard to the “flexibility” involved in abstract concepts, for Dove it seems to be a challenge for modalism that “some individual concepts can be used in either a more or a less grounded fashion, depending on the circumstances.” (2016:1113). For instance, an fMRI experiment by Saygin et al. (2010) showed that when the brain processes “The wild horse crossed the barren field”, motion-sensitive visual areas were more active compared to other sentences containing the verb “to cross”, like “The hiking trail crossed the barren field”. The third challenge rests on the claim, according to Dove, that concepts like ODD or TRUTH seem “divorced from experiential factors” (2016:1114) and, therefore, it is difficult to see how abstract concepts can “even in principle” be grounded in sensorimotor representations. Finally, he cites a vast amount of evidence for an abstract/concrete asymmetry (i.e., some areas are preferentially activated for abstract concepts in representing and processing concepts) (2016:1114–1115) as support for amodal representations.

Modalists have embraced different strategies to face the challenges posed by abstract concepts. One suggestion that is gaining momentum is that abstract concepts are grounded not only in the modalities of the five traditional senses, but also in interoceptive states (see, e.g., Barsalou and Wiemer-Hastings 2005; Connell et al. 2018; Fingerhut and Prinz 2018; Vigliocco et al. 2014). This might plausibly work for concepts like FREEDOM or ANGER, but it is unclear how affective grounding could help, for instance, with ODD or TRUTH. Also, not all authors agree that interoceptive states have a central role, and Lenci et al. (2018), for instance, suggest that linguistic representations are needed and play the primary role in abstract concept representation. They deny that the affective load of abstract concepts refutes the position that abstract concepts are exclusively linguistically represented. They claim that affective information could be linguistically derived or a by-product of co-occurrence statistics (but see Vigliocco et al. 2014, who argue against a primary role of linguistic information for conceptual representations). Indeed, some modalists find the idea of combining modal grounding and linguistic representations into a hybrid appealing (e.g., Louwerse 2018; Pecher and Zeelenberg 2018) (see also Section 5, where I discuss representational pluralism).

However, as I will argue in a moment, the modalist does not necessarily need linguistic in addition to sensorimotor-plus-interoceptive representations for a defence. I have already alluded to elements of a (weak) modalist strategy, namely the appeal to abstracted multimodal representations. Let me expand more on the sort of representations involved and then respond to Dove’s challenges on behalf of a modalist.

4.2 A possible modalist response

Modalists, recognizing the need for abstraction, could appeal to a representational structure of concepts based on a modal abstraction and convolution hierarchy (let’s abbreviate it by “MACH”). What a modal hierarchy of abstraction amounts to can easily be derived from contemporary neuroscience and AI (specifically deep learning). Modal processing comes with a built-in abstraction process. Take, for instance, the ventral processing stream of visual information consisting of a flow from the retina through to the cortical areas V1 - > V2 - > V4 - > IT. As one advances in the stream, the receptive field size of the representations increases, and the representations get more and more abstract. But they still remain - quite indisputably – visual.Footnote 4 Abstraction per se does not eliminate modality. Single neurons or neuron assemblies represent, say pixels, in early-stage retinal processing. In a later stage, a single neuron or a neuron assembly represents the shape of a certain edge. In each step, the brain abstracts from details available in previous stages. Similarly, the mixing of two or more modalities (convolution) does not lead to a representation that is devoid of modality. Different modalities can be “convolved” or folded into each other (see, e.g., Thagard and Findlay 2012; Radu et al. 2018; Ramachandram and Taylor 2017, for deep multimodal learning). “Convergence zones” (e.g., Meyer and Damasio 2009), “supramodal areas” (e.g., Fairhall and Caramazza 2013) or “hubs” (e.g., Patterson and Lambon Ralph 2016 - see also Section 5) posited by neuroscientists, could be locations where convolution happens. There, different modalities flow together to create more abstract, multimodal representations (not necessarily amodal ones, as is often claimed). Those representations can be “unpacked” top-down by co-activating appropriate lower-level representations and providing more granularity or detail to the representation (and cognitive phenomenology) in different modalities. For instance, the highest level (most abstracted) representation of the concept THREE might be in the form of three “vague and schematic things” (where “thing” corresponds to some highly abstract concept THING, which includes any possible entity not only tangible things). Or, by involving co-activated lower level representations, it might be in the form of three schematic apples, three specific green apples, three schematic sound events or three specific sounds. Similarly, DEMOCRACY can be seen as a very complex high-level multimodal representation that we might unpack context-dependently in many fashions and mixtures; for instance, in the form of a voting scene, but also as a definition, as an exemplar in the form of a paradigmatically democratic country, or as some subjective feeling of justice and freedom. Whatever has been folded into (by concept formation, or by evolution) the highest-level node of the hierarchical network structure of the concept DEMOCRACY can now be retrieved selectively, and with the level of detail or schematicity needed, depending on the context and task.

MACHs allow a response to Dove’s challenge in the following way. The hierarchical structure can, by definition, account for vertical and horizontal generalization. Representations are organized into abstraction trees. Nodes form a vertical abstraction gradient and all child-nodes of a parent are related horizontally. Regarding the challenge of flexibility, it is a challenge as much for amodalism as it is for modalism. One needs to come up with a mechanism to account for the high degree of context-sensitivity of concepts, so modalism is not worse off here at all. A more specific computational proposal is needed to advance here. In Section 6, I suggest a mechanism that a modalist could appeal to. The third challenge, disembodiment, rests on the claim, that abstract concepts seem quite remote from direct experiential representations. However, it is not clear why it should be, in principle, impossible to represent abstract concepts that involve categories of events, situations and mental states in terms of abstracted and convolved modal information. Of course, such representations must undergo a very complex abstraction and convolution process using a wide range of modalities (including interoceptive states) and it might be difficult to decompose them into simple experiential components. Finally, the difference in behaviour due to the modal/amodal asymmetry can also be explained naturally given that there is a gradient of abstraction. The ends of the hierarchy might, of course, “behave differently”. At the more abstracted end, representations behave “amodally”, while closer to the periphery (the bottom of the MACH) they behave “modally” (perceptually).

Let me highlight that MACHs work for abstract and concrete concepts. The challenge of generalization is as much a challenge for concrete concepts, as it is for abstract concepts. In a certain sense, abstract concepts are not qualitatively different from concrete concepts. A concept denotes a category and any category is abstract by definition. DOG is abstract, though you can touch, see, smell, etc., exemplars of DOG, i.e. dogs. DOG is in this sense as abstract as DEMOCRACY. The difference resides in certain characteristics of exemplars. Exemplars of DEMOCRACY must be very complex states or situations indeed. So abstract concepts are not different in type, but merely require significantly more complex modal abstractions and convolutions, so the modalist can argue. That amodal representations should prima facie be better suited for abstract concepts rests, I suspect, on a confusion. Merely appealing to the “abstract” nature of amodal representations does not explain how they can be representations of abstract concepts. This would be conflating two readings of “abstract”, one referring to a property of the vehicle (the representation and its degree of information compression) and one related to the content of what it represents (a certain category whose exemplars share certain characteristics). Amodalists do not have an advantage here then; in fact, quite the contrary. A weak modalist can explain how concepts can mean anything in the following way: if basic level representations get abstracted (compressed) to higher level (still modal, but less detailed) representations, and the meaning is in this sense grounded in basic level representation, then the more abstract representation inherits content from below. The amodalist needs to appeal to arbitrary symbolic relations and explain how those symbols can refer to and can mean anything. There is a range of proposals available, of course (see, e.g. Tillas and Trafford 2015). My point is merely that amodalism is not a no-brainer default position for abstract concepts and one needs to be careful about being drawn into an intuition based on the above conflation of the notion “abstract”.

Let us take stock. Empirical results have not been able so far to adjudicate the modal/amodal debate. The “challenge of abstract concepts” turns out not to be an insurmountable stumbling stone for modalists and is a challenge for amodalism as well. However, despite this situation, the field has advanced substantially by accumulating quite compelling evidence for the significant involvement of sensorimotor representations in conceptual processing, and also for the involvement of either amodal or abstracted (multi-)modal representations. Extreme positions on the continuum of Meteyard et al. are therefore unlikely winning proposals. In general, it seems possible to concoct intermediate positions on both sides, to account for most of the evidence. However, the question is not so much whether we can somehow account for the evidence, but rather what account provides the best explanationFootnote 5 in terms of other virtues like theoretical simplicity, consistency, coherence and fruitfulness.Footnote 6 So, it is worthwhile having a look at some more specific hybrid proposals to see if one of them provides a way out of the impasse.

5 Are hybrid approaches the way out?

An increasing number of hybrids try to accommodate the evidence for the importance of sensorimotor representations and the existence of abstracted representations. In what follows, I briefly review four examples: two proposed by amodalists, and two by modalists.Footnote 7 As we will see, hybrids built on the modal/amodal distinction have drawbacks and seem unable to resolve the debate.

Mahon and Caramazza (2008) acknowledge that modality-specific information plays a crucial role in the use of concepts. However, they insist that only the amodal representation is constitutive of the concept:

On the grounding by interaction view, the specific sensory and motor information that goes along with the instantiation of a concept is not constitutive of that concept (2008:68).

However, the “grounding by interaction” account of concepts implies a very anaemic notion of concepts. If I am correct, their implied notion of concept is concerned exclusively with the referent, and hence with questions covered by the intentionality desiderata for a theory of concepts (see Prinz 2002). Cognitive content and psychological significance are relegated to a secondary, non-conceptual role. Their account is also formulated quite generically, and they provide no specific cognitive mechanism of how this interaction is supposed to work. It seems we cannot empirically distinguish it at that general level of formulation from an account in which such modal information is constitutive of the concept and concepts retrieve context-dependently modal information. If an amodal concept representation is often accompanied by a co-activated modal representation and does significant cognitive work, according what principles is that modal representation not a constitutive part of the concept?

Machery’s “off-loading hypothesis” shall serve as a second example of a hybrid account. Machery (2016) acknowledges that we often use perceptual and motor representations to solve cognitive tasks. However, he rejects the conclusion that this implies that (at least some) concepts are modal. He suggests that we offload many cognitive tasks from the amodal conceptual system to sensorimotor representation. Motor and sensory representations are hence not constitutive of conceptual representations but are used heuristically:

In contrast, according to the offloading hypothesis, we often offload the solution of tasks on perceptual and motor systems: While concepts themselves are amodal, we often manipulate perceptual and motor representations to solve tasks. [...] Offloading may happen when the conceptual system does not encode the information needed for solving a given task (e.g., information about perceptual details), while perceptual representations stored in memory do. Offloading also may happen for tasks that can be efficiently solved this way (2016:1094).

This is an interesting proposal and it seems to imply the existence of some algorithm or mechanism that implements the offloading heuristics. If the amodal system is not able to solve a cognitive task alone, it uses the resources of modality-specific representations. This is a hybrid proposal in the sense that it implies the distinction of two separate representational systems that interact. Again, my concern is whether the offloading hypothesis is specific enough to be empirically testable. What makes a particular activation pattern in the modality-specific regions an “offloading” as opposed to a context-sensitive co-activation of that information? Also, how could it account for some concepts, like specific colour concepts, that seem to come by default with some (maybe vaguely) imagined colour impression?

Let us turn to modal hybrids to see whether they fare better. The Hub and Spokes model (HSM) (e.g., Binney et al. 2012; Guo et al. 2013; Patterson and Lambon Ralph 2016; Rogers et al. 2004) suggests that both modality-specific (spokes) and amodal information (in the “transmodal hub”) are necessary components of a concept representation. The modality-specific aspects of a concept are represented in the corresponding sensorimotor (and linguistic) areas. The hub-component sends and receives information from the modality-specific regions. The hub abstracts away from specific modal features and codes the “semantic similarity structure”. The hub-component, therefore, unifies the different modal information sources and provides a coherent and generalizable concept. Both hubs and spokes are necessary and the HSM does not imply that concepts have an abstract form and reside in the hub region (which is proposed to be located in the anterior frontal lobe, the ATL). For the necessity of hubs speaks, according to the authors, evidence from studies of patients suffering semantic dementia (SD): ATL atrophy leads to SD. Cross-category loss of classification and generalization without deterioration of modality-specific areas indicate that the problem must be in the integration of modal information. Evidence for the HSM, however, is compatible with the modal view based on MACHs. The hubs could simply be areas that contain modal abstracted and convolved representations. The evidence for the HSM is not clearly evidence for a dichotomic modal/amodal model. Indeed, some authors have suggested that the role of ATL as “the” hub is overemphasized (see the overview of ATL functions by Wong and Gallate 2012) and the ATL has many other functions and in many other regions representational abstraction happens.

The second example of a hybrid leaning towards a modal view of concepts is the “Symbol Interdependency Hypothesis” (SIH) account (e.g., Louwerse 2018), which is an account of representational pluralism. It combines modal and linguistic representations as mutually reinforcing. The motivation stems from the following sort of reflection. We might learn concepts without the intervention of sensorimotor input, for instance in school via definitions and verbal explanations. Also, we often bootstrap meanings via the context in which a word appears. Therefore, language plays an important role in concept acquisition. Given the role of linguistic representations, we might say that amodal representations play a role in concept representation and sometimes concepts are represented linguistically, i.e., amodally. This provides a basis for meaning via indirect grounding: the word is grounded indirectly via the surrounding grounded words. This view is, arguably, modally biased, as grounding is necessary, though the requirement is weakened by allowing indirect grounding. The SIH account then claims that amodal representations encode semantic information via distributional statistics. Words get their meaning from direct grounding and from indirect grounding via the linguistic context. Representations grounded indirectly allow then for at least “quick and dirty representations”, while a deeper understanding would require direct grounding. I am very sympathetic towards this approach, but I see various problems as it stands. Firstly, it is not entirely clear what takes the role of amodal representations. Are they linguistic natural language representations? This would mean giving up Fodor’s LOT which does not rely on natural language but mentalese. Give up mentalese might be an option, of course. However, this assumes that natural language representations are amodal, which can be debated, because they involve sound, gestures and/or visual patterns (see Langacker, e.g., 1987, 2008, who endorses that linguistic representations are modalFootnote 8). The SIH account claims that the meaning of unknown words is grounded by their “distributional statistics”. It is difficult to see how the statistics themselves ground the meaning of the words. It seems to me that we understand an unknown concept appearing in a certain linguistic context not in virtue of the wordforms by which it is surrounded, but in virtue of the content those surrounding wordforms represent. Keeping in our memory information about the statistics of surrounding words might be merely a temporary heuristic, with the ultimate aim being to extract direct grounding indirectly from the surrounding words. The statistics would then play the role of a mere placeholder. It seems more plausible that words and their statistics provide access to meanings but do not constitute them.

In sum, hybrid accounts try to combine the need for abstracted representations with the fact that sensorimotor representations are pervasively present in cognition. However, the amodally biased accounts have an ad-hoc air and are quite unspecific, while the modally biased accounts seem slightly better motivated, but face other problems. So, it is not yet clear that hybrid accounts can resolve the debate.

Some authors (e.g., Dove 2016) have suggested, in the face of the empirical stalemate, that weak modalism is not a position that is distinguishable from amodalism. However, Dove maintains the dichotomy and claims that it is modalism (embodiment) that collapses into amodalism. I wonder why, if both positions are indistinguishable, he then does not consider the possibility that it is amodalism that collapses into modalism. Dove assumes in his argument that an abstract representation is an amodal representation. But this is an unjustified conflation, as abstract must not necessarily mean “void of modal information”. A second possible response to the empirical deadlock could be given along the lines of Machery (2007). Machery refers to “Anderson’s problem” (see Anderson 1978). Anderson already observed the difficulty, in principle, of distinguishing modal and amodal representations: “The correct conclusion from Anderson’s argument is that amodal theories and empiricist theories are on par” (Machery 2007:31). Machery then suggests that we need more detailed and specific modal and amodal theories for a given cognitive task that allow us to derive and test “contrastive predictions”. However, so far, we have no example of such a cognitive task for which more specific weak modal and amodal theories have been developed and contrastive predictions derived. I agree with Machery that more specificity in the proposals might be required for the debate. However, note that all accounts discussed here are based on some quite unclear modal/amodal dichotomy. When searching for a suitable theory of conceptual representations, ceteris paribus, a more integrated account out of which a distinction between the two representational types arises in a principled way would theoretically be more pleasing. Therefore, I suggest considering for a moment, whether it might not be the very dichotomy, presupposed widely in the debate, which is the source of the troubles. In the next section, I will provide a computationally (and neuronally) more specific account of conceptual representations to show how we could understand the modal/amodal distinction as one of degree. To make the proposal specific enough, I will use a cognitive computational framework, grounded in neuroscience, namely the so-called Predictive Processing (PP) framework.

6 Overcoming the modality/amodality dichotomy: An example

6.1 Predictive processing and concepts

There is no space here for a detailed exposition of the PP framework. Given that PP has already been widely covered in the literature and many useful introductions are available, I will only very swiftly summarize the bare-bone essentials of PP, which are necessary to follow my example, and refer to the literature for a wealthier background. I will then describe a recently proposed model for concepts within PP (Michel 2020).

Predictive Processing (PP) (see Clark 2013, 2016; Hohwy 2013; Friston 2010) pictures the brain as a dynamical prediction device that constantly predicts its sensory input and updates its model to minimize prediction error. The brain uses a multi-layer probabilistic prediction model in which approximate Bayesian inference is carried out (e.g., Clark 2013:188–189; Hohwy 2013:15–39). The PP model has a hierarchical structure and represents prior knowledge on many levels of abstraction (e.g., Clark 2013:25; Lupyan and Clark 2015). Information flows bottom-up and top-down in this system. In the downward prediction cascade, the predictions of higher-level layers serve as priors for the lower level predictions and, in this way, constrain the hypothesis space on the lower level. Computations in the PP model are driven by the goal of minimizing the average prediction error in the long run. The PP system also contains a mechanism of precision-weighting of the prediction errors (Clark 2016:53–83). The brain must predict the reliability of its sensory input in order to be able to distinguish between noise and useful signals. In this way, it can avoid modification of the model due to noisy signals. For that purpose, the mechanism assigns weights to the error signals and thus determines the influence of the top-down predictions versus bottom-up driven updates of the model.

To show how modality might be seen as a graded notion, I will use as an example a cognitive-computational model for concepts within the PP framework (see Michel 2020). According to this model, concepts are “prediction units” (or “concept units” as I will call them here). Concept units are the vehicles of predictions in the PP framework. They play a crucial role in efficient predictions because they are the entities in terms of which predictions are made with the appropriate level of detail. For instance, when crossing a street, it is not efficient for the brain to predict the presence of a car on a pixel-level of detail. Instead, it should be predicted in a more compressed and schematic way. Concept units are interconnected in a hierarchical network structure covering the whole range, from early sensory representations to representations in the cortical brain areas. The information associated with a concept (features) consists in the connection to other concept units. The information retrieved (i.e., other co-activated concept units) when a concept is tokened can be context-sensitively modulated. Very roughly, the PP precision weighting apparatus allows for switching on and off concept features (i.e., connections to other concept units) that are relevant to the context.

6.2 Overcoming the dichotomy

With a specific cognitive-computational model for concepts in place, I will now show how to overcome the modal/amodal dichotomy and suggest how to reconceptualize modality as a graded notion within this model. Let me start by linking the picture of concepts just sketched with the idea of increasingly abstract representations in a hierarchical representational structure, as posited by PP. The higher a concept unit is located in the network hierarchy of the PP model, the more abstract or compressed the information corresponding to that single node is. On the lowest level of the hierarchy, we have representations in the sensory-motor periphery. One might not want to call those low-level representations “conceptual”, but nothing hangs on it. The critical point is that we have a multi-level hierarchical structure of interconnected representational units that are increasingly abstract from the bottom to the top. Furthermore, and crucially, the context-dependent instantiation of a concept might span a network of nodes across an area of varying extension in the hierarchical model. Now, with such a view of concepts, the dichotomy modal/amodal does not cut much ice anymore. To see this, I will argue from two perspectives, the amodalist’s and the modalist’s one, to be charitable to both (remember, we have concluded that empirical evidence does exclude extreme views but does not decide between weaker versions of modalism and amodalism).

6.2.1 From the amodalist perspective

Take, for instance, the neural location criterion, implicitly assumed by many amodalists, which holds that a concept is amodal/modal if it is located in a (generally recognized) amodal/modal processing area of the brain. Assume that we could localize the highest-level concept unit in an area that is agreed to be amodal. However, the concept token also includes other feature nodes, and some of them might or might not be in brain areas that are agreed to be modal. That depends on the concept and the context. So, rather than saying that a concept is modal or amodal, we should say that a concept can have amodal or modal instantiations: if all of the co-activated features are in amodal areas the concept is amodally instantiated; if at least one feature falls in an area that can be characterized as modal, it is a modal instantiation.

One could object and suggest that one should characterize the concept depending only on the location of the highest-level root-node and ignore the co-activated feature nodes to portray the concept as modal or amodal. If the root node is located in an amodal brain area, we are dealing with an amodal concept; otherwise, we are dealing with a modal one. However, that seems quite arbitrary, because why should the co-activated features be ignored? In many cases they are most likely to be co-activated because they are cognitively relevant and useful in the cognitive task. Also, given the hierarchical structure with the built-in graded notion of abstraction (with increasing abstraction from bottom to top), the introduction of a sharp dichotomy does not seem justified. Instead, it seems more adequate to carry the graded notion of abstraction over to a graded notion of modality.

In this model, a concept is not modal or amodal simpliciter. But this view does not imply that we have to give up either notion. For instance, there is a sense in which we could still give amodality a vital role. There is, namely, a sense in which concepts can be tokened in an amodal mode, without being an amodal concept simpliciter. For that purpose, let me introduce the notion of “shallow” and “deep” processing of a concept inspired by Barsalou (e.g., Simmons et al. 2008; Barsalou et al. 2008) and other authors (e.g., Erickson and Mattson 1981, or Barton and Sanford 1993), which might be useful here. Their idea with regard to the depth of processing can be applied to the PP model of concepts as feature networks. The basic idea is that, for example, in a reading task,Footnote 9 a concept might be processed – at the one extreme - only superficially. During such “shallow” processing only a small part at the top of the network of a concept is activated (in the limiting case only the root node of the concept unit itself). For instance, when analysing a syllogism, one need not activate the full concept network of the involved words, and one can ignore most connections to other concepts and treat the words as mere placeholders (though, of course, it is difficult to completely suppress the meaning when reading a word). A shallow representation is enough for the purpose at hand. Or, to give another example, in the case that one has a very superficial understanding of some concept (maybe a technical term one is not familiar with), the processing is quite shallow, simply because the concept network is small or even limited only to the linguistic label.Footnote 10 At the other extreme, when reflecting very consciously on the meaning of a word, the resulting activated representation might be extremely rich, including, e.g., sensory-motor information regarding exemplars associated with that concept.

The PP story of concept contextualism provides resources to account for the processing modes of conceptual webs that vary in terms of depth. We could imagine that, on some occasions, concept tokens are instantiated only by the root node, possibly together with a few other nodes in adjacent hierarchical levels, without reaching deep into low level peripheral sensory or motor areas (though they could, depending on the context, of course). So, we could have settings in which concepts are processed shallowly. But that would be merely a limiting case on a continuum from very deep to very shallow processing. A concept could appear amodal in a shallow processing mode. But in appropriate contexts, the same concept could also be processed in a modal mode, in which concept units in lower-level sensorimotor areas are co-activated.

6.2.2 From the modalist perspective

So far, we have assumed a neural localization criterion, which presupposes the existence of (genuinely) amodal areas in the brain. But, as we have seen, a weak modalist might deny the existence of amodal representations in the first place and point to MACHs. Concepts are more or less abstracted and convolved modal representations (that are never fully free of modal information, i.e., amodal). But the modalist could be aligned with the PP view, where concepts are instantiated flexibly as networks with nodes across a continuum from low-level sensorimotor nodes to highly abstracted and convolved ones. On the other hand, amodality could now be seen as an (unreachable) limiting case, or asymptote, of maximally shallow processing of nodes (they may, though, vary in their degree of abstraction, depending on the level on which they are located). The more abstract and the shallower the instantiation, the more the concept “looks” amodal. Some modalists have suggested taking on board linguistic representations (Section 5). We can’t cover here the relationship between concepts and language but let me hint at the following suggestion (which might allow for fleshing out more consistently the hybrid proposals that combine modal and linguistic representations and which I have criticized in Section 5). Modalist could allow for (arbitrary) linguistic labels (i.e., other representations not involved in the hierarchical abstraction gradient) attached to the root-nodes of concepts. This move introduces the possibility of an “amodal” instantiation of a concept, and, in this way, the modalist can “close” the modal-amodal continuum at the amodal end. An arbitrary label in itself would no longer carry abstracted modal information and, if instantiated alone, would be merely a meaningless (shallow) placeholder. Maybe some concepts (namely entrenched lexicalized ones) have such labels as their root nodes.

In sum, from both perspectives, that of the amodalist and the modalist, it turns out that the modal/amodal dichotomy does not look very useful anymore and it should be overcome by reconceptualizing it as a distinction of degree. If the tokens of a concept can (context-dependently) cover a whole range of levels in the PP model hierarchy, there is no reason to call the concept modal or amodal simpliciter, and it would be better to characterize the modal/amodal distinction as one of degree. Concepts do not fall into modal and amodal concepts. The modal/amodal continuum is parallel with the continuum of shallow/deep processing and the continuum of increasing abstraction from the bottom to the top.

6.3 Some benefits of the model

The picture of concepts as located in an modal/amodal continuum that I put forward here has various advantages. Firstly, it is based on a cognitive-computational model that is specific enough to carry the hope that we can test it empirically. Furthermore, it can accommodate both the concerns of modalists and amodalists, because it accounts both for semantic and cognitive content. Indeed, if we consider only the root node, we can account, for instance, for the intuitions behind Fodor’s amodalism (conceptual atomism). Fodor is mainly concerned with reference and semantics, not with psychological and cognitive significance. The root node plays an “atomic” role. Under certain circumstances, we can idealize matters and consider only the root node and let it stand in for the whole concept. Such an idealization, of course, ignores the context-sensitivity of concepts and the cognitively relevant content or phenomena that led to the proposition that concepts have some internal structure and are not merely atomic symbols.

Secondly, the proposed model of concepts is compatible with (or close to) a range of recent accounts of concepts and can be seen as an underpinning computational model for them. Let me very briefly point to some examples. For instance, the model is compatible with the “improved” LOT account by Schneider (2011). Schneider claims that Fodor’s LOT is underdeveloped with regard to the notion of a mental symbol. She proposes that a mental symbol’s identity is determined by its total computational role. In the view of concepts presented here, the total computational role of a concept is encoded in the way in which its root node is embedded in the structure of the entire hierarchical network, specifically how it is connected with other nodes and how context-sensitive co-activation patterns with feature nodes arise. Furthermore, my account is close to the perceptual symbol accounts of Prinz and Barsalou but spells out more details and provides an additional twist. For example, in Prinz’s account, “concepts are proxytypes, where proxytypes are perceptually derived representations that can be recruited by working memory to represent a category” (2002:149). But it seems unclear what conveys stability (or identity) to a concept if each tokening of a concept can be different. The root node of the concept (concept unit) in the model proposed here plays such a stable referential role. The flexibility and context-sensitivity of concepts demanded by Prinz is preserved by the feature selection mechanism based on precision weighting in the PP framework. Tillas & Treford (2015:7) propose an account for concept individualization close to Prinz’s account, but which differs in that the individuation takes place “by virtue of a representational core,” which is an “abstracted representation that shares enough similarities with all members of a given category”. The root node of the concept in the model I have suggested could be considered to be such a representational core. Tillas & Treford are mainly concerned with the question of how we can “share” concepts given the vastly different individual concept acquisition histories and the significant context-dependence of concepts. They think that the common core plays a key role here because it “secures reference, which in turn provides the ground for communication”. However, the questions of how reference works, and how we can “share” concepts (or a language) require much more discussion and cannot be covered here.

While the suggested PP model tries to address the concerns of both modalist and amodalist, apparently, it is an account with much sympathy for modalism. The PP model could accommodate amodal representations built into the hierarchical structure. However, it seems more natural and parsimonious to say that the amodal appearance of conceptual representations arises as an asymptotic case (namely for shallow processing) out of a predominantly modal view. The overall PP prediction model of an individual is the result of a constant adjustment with top-down and bottom-up influences for global, long-term and average prediction error minimization. The purpose of cognition is to contribute to successful interaction with the world. This implies that all representations tend to be influenced by the sensory bottom-up flow. If a concept does not help in the error minimization mission, it will be over-written sooner or later (or the individual will lose survival fitness), so ultimately it owes its existence to sensory influences. Even if genuinely amodal concepts existed and were inborn (as Fodor famously held), evolutionary pressure would have ensured that only those amodal representations remain in the evolutionary endowment that contribute to dealing with the sensory inflow and world interaction optimally. All concepts tend to be “grounded” in this broad sense in sensory input.

7 Conclusion

In this paper, I have suggested that we should overcome the dichotomic distinction between modal and amodal representational formats, because of two significant problems it faces: firstly, there is no shared understanding of what modal and amodal formats are; and secondly, both views can accommodate the available empirical evidence. Hybrid accounts, as they currently stand, do not seem to provide a fully satisfying solution either. I have tried to show how we could reconceptualize the modal/amodal distinction as a graded one, using a specific cognitive-computational model of concepts (within the Predictive Processing framework) as an example. In this model, a concept is a distributed multi-level network of concept units. A specific tokening of a concept can include, context-dependently, nodes from all across the hierarchy, from peripheral sensorimotor areas to the highest cortical levels. Typical amodality is an idealization instantiated by a shallow mode of concept processing (lowest level of detail of prediction in PP terms). In this case, concept instantiation is limited to the root nodes, and no other lower level feature-nodes are co-activated. Typical modality, in turn, arises when we process the concept in a deep mode, also involving lower levels of sensorimotor representations (highest level of detail of prediction). In sum, in this view, there are no separate modal and amodal systems or representational structures in the brain; modality and amodality correspond to limiting cases of the (context-sensitive) processing depth in a distributed, hierarchical concept network.