1 Introduction

An emerging, though not universal, consensus in comparative psychology has it that at least some non-human animals ‘read minds’. That is, they ascribe mental states to others, and predict behaviour on the basis of these mental state ascriptions. There is evidence that many animals reason in this way about others’ goals and intentions, as well as their epistemic states (Bugnyar et al. 2016; Call and Tomasello 2008; Clayton et al. 2007; Hare et al. 2000; Hare et al. 2001; Marticorena et al. 2011). Until recently, it was thought that animalsFootnote 1 were characteristically limited in their ability to represent epistemic states: they could represent only factive epistemic states, like perception and knowledge, and were consequently unable to pass the ‘false belief test’ (Call and Tomasello 2008). On this basis, it’s been suggested that they understand mental states as relations holding between individuals and objects or situations, but not as representations with semantic properties like being apt for accuracy and inaccuracy. Recently, though, it appears great apes have cleared even that hurdle: their anticipatory looking behaviour suggests that they predict how others will act when they have a false belief about an object’s location (Krupenye et al. 2016).Footnote 2

Suppose we take this result at face-value, as showing that great apes represent representational states as such – as the kind of thing that can be accurate or inaccurate. Does this show that great ape mindreading abilities differ from our own in degree only, and not in kind?Footnote 3 In this paper, I argue that it does not: there may be substantial differences between great ape and human mindreading, even if great apes represent representational states as such. One underexplored dimension along which forms of mindreading may vary concerns the representational format a mindreader takes mental states to have. When humans ascribe propositional attitudes, I suggest, we at least sometimes take them to represent linguistically. If a creature took mental states to be exclusively non-linguistic, the result would be a distinctive and limited form of mindreading.

As a case in point, I discuss the possibility of ‘mindmapping’ – that is, taking mental representations to be map-like. I show that this would provide an effective way of representing others’ beliefs about the spatial arrangement of their environment, but that it would otherwise impose substantial constraints on the range of belief attributions one could make. In general, the representational format one takes mental states to have will significantly affect one’s mindreading abilities – since representational formats differ in their formal features and expressive power. I close by articulating the significance of this for the study of great ape mindreading. I argue that our evidence currently underdetermines what format apes take representational states to have: consistently with our evidence, they might be mindmappers. Further empirical work might reveal whether or not this is so, providing a more complete picture of great apes’ understanding of the mind.

2 Varieties of Mindreading

To be a mindreader is to be able to ascribe mental states to others. In adult humans, this often takes the form of ‘propositional attitude mindreading’.Footnote 4 That is to say, we ascribe propositional attitudes like ‘he thinks the ice is melting’, ‘she wants to win the tournament’ and so on. We take propositional attitudes to have contents which are truth- or accuracy-apt, which can be expressed using sentences, and which stand in inferential relations mirrored by the inferential relations between the sentences used to express them. It is in virtue of these inferential relations that propositional attitudes contribute to reasoning and give rise to behaviour in the way that they do. We appreciate this, and are able to predict and explain behaviour accordingly.Footnote 5

But there might be other ways to represent the minds of others, besides representing propositional attitudes as such. In fact, standard characterisations of mindreading leave open the possibility that there might be different varieties of mindreading. One characterisation of mindreading treats it as the representation of causally efficacious unobservable states (Penn and Povinelli 2007; Whiten 1996). This view counts representing non-propositional states like hunger or pain as mindreading, since both are causally efficacious ‘inner’ states. Another treats mindreading as the representation of intentional states, that is, states which are directed on things in the world (Hutto et al. 2011; Povinelli et al. 1996). On this view, mindreading includes representing states like seeing the ball or wanting the doll – which are directed on objects, but don’t represent those objects as being some way. Representing propositional attitudes qualifies as mindreading on both of these accounts, since propositional attitudes are intentional states and causally efficacious unobservables. But neither view requires mindreaders to represent propositional attitudes as such.

In this connection, it’s been suggested that there may be various forms of ‘minimal’ mindreading. These alternative forms of mindreading are minimal in the sense that they do not involve ‘metarepresentation’. That is, they do not involve representing representational states as such. A representational state is a state with semantic properties – notably, having a content apt for accuracy and inaccuracy. The content of a state is what it represents. For instance, my belief that my house is white represents that my house is white. My belief is accurate just in case my house is white. To represent a representational state as such is to represent it as having these semantic properties. We do this when we ascribe propositional attitudes; minimal mindreaders, by definition, do not.

This distinction between minimal and metarepresentational mindreading has been explored in a number of places. One prominent account of minimal mindreading suggests that it involves exploiting a ‘minimal theory of mind’ which defines relational surrogates for our representational mental state concepts. For instance, rather than representing perception, minimal mindreaders represent encounters. Encountering is a relation roughly equivalent to having a direct line of gaze: an agent encounters an object iff there is a clear line between her eyes and the object, and certain physical conditions are met, such as good lighting. The simplicity gained by representing mental states as relations rather than as representations comes at a cost. There are ‘signature limits’ on the abilities of minimal mindreaders. That is to say, there are certain mental states they cannot ascribe, but which ‘full-blown’ mindreaders can (Apperly and Butterfill 2009; Butterfill and Apperly 2013).Footnote 6 In a similar vein, José-Luis Bermúdez (2011) distinguishes ‘perceptual mindreading’ from propositional attitude mindreading, where the former, again, involves representing relations between agents and objects or states of affairs. And Call and Tomasello (2008) have proposed that chimpanzees understand others using a ‘perception-goal psychology’, distinct from and simpler than the belief-desire psychology with which adult humans operate.

In each case, the suggestion is that minimal mindreaders differ from ‘full-blown’ mindreaders in construing mental states as relations between agents and things in the environment. This is contrasted in each case with ascribing representational states, that is, states with contents apt for being accurate or inaccurate. But more specifically, in each case, the contrast is drawn between minimal mindreading and the variety of mindreading often engaged in by adult humans – propositional attitude mindreading. For instance, Butterfill and Apperly (2013) write that ‘full-blown’ mindreading is more challenging than minimal mindreading because propositional attitudes have a number of complex properties. For instance, they interact with one another in uncodifiably complex ways, and have arbitrarily nestable contents – such that one can think things like ‘Bill believes that Priya hopes the Bobcats will win the game.’ The suggestion is that full-blown mindreading involves appreciating this, and representing mental states as having these features. Similarly, Bermúdez (2003b, 2018) highlights that, unlike perceptual states, propositional attitudes do not impact behavior directly, but only in concert with other beliefs and desires, in virtue of the logical, structural relations holding between these attitudes. As a result, he claims, a propositional attitude mindreader must be able to represent the logical relations between attitudes, and ‘work out how the relevant beliefs or other propositional attitudes will feed into action’ by ‘recreating’ the target’s reasoning.

These are no doubt features of the propositional attitudes, and correspondingly features of the propositional attitude mindreading often engaged in by adult humans. But this should not lead us to think that propositional attitude mindreading is the only alternative to minimal mindreading, or that representing representational states as such necessarily involves representing them as having these other properties. Just as a minimal mindreader fails to represent the semantic properties of mental states, a metarepresentational mindreader’s grasp of the features of mental states might be partial. There is surely a region of logical space here that has so far been overlooked: a region occupied by creatures who know that others represent and misrepresent the world, but who do not take them to have propositional attitudes as such.

As a first step in exploring this region, consider that to ascribe representational states to others, we represent that they have representations which express an accuracy-evaluable content. Any representation which expresses something accuracy-evaluable expresses it in some representational format. By a representational format I mean something like a sentence, a diagram, a map or a picture. The same content can be expressed in different formats. For instance, ‘my house is white’ expresses that my house is white, but a picture of my house might also express this. What is expressed by the sentence ‘average monthly rainfall in New York doubled in the year to April 2018’ could also be expressed by a diagram. I take the differences here to be differences in representational format, rather than in content.

Representational states themselves can exploit a range of representational formats. This is significant, since representational formats differ in certain important ways, with the result that some are better suited than others to the expression of certain contents, and some have greater expressive power than others. As David Marr (1982) notes, the representational format one employs ‘determines what information is made explicit and hence what is pushed further into the background, and it has a far reaching effect on the ease and difficulty with which operations may subsequently be carried out on that information’.Footnote 7 The representational format exploited in a cognitive task may consequently have substantial effects on one’s cognitive capacities and behavior.

So, we can understand the idea that a creature’s thought exploits a particular representational format in functional terms. To say that a creature’s thought is imagistic (for instance) is not literally to say that there are pictures in its brain, but to say that its thoughts realize certain functional patterns. They can have certain contents, but not others, and they can transform those contents in particular ways. (Camp 2009, p. 111). A creature thinking exclusively in images could only entertain contents that an image could express, and since most logical operators have no imagistic counterpart, its capacity for inference would be extremely limited.

To take another example, Jacob Beck (2015) has argued that many animals use ‘analogue magnitude representations’ to represent quantities. These are primitive systems for representing quantity that do not rely on grasping units of measurement or number systems. An effect of using analogue magnitude representations is that one’s ability to discriminate two quantities diminishes as the ratio between those quantities approaches 1:1. The representational system illustrated in Fig. 1 is an example, using dashed lines of increasing length to represent quantities of increasing magnitude. In this system, the quantities represented in Fig. 2a are easier to discriminate than those in Fig. 2b – whereas in Arabic numerals, they would be equally easy to distinguish. To say that some animals exploit analogue magnitude representations, then, is in part to say that their representations of quantity have this feature: their ability to discriminate quantities diminishes as the ratio between them approaches 1:1.

Fig. 1
figure 1

A system of analogue magnitude representations

Fig. 2
figure 2

Quantities represented using analogue magnitude representations. The quantities in (a) are easier to discriminate than those in (b)

In the light of this, we can ask: when a mindreader ascribes a representational state, what representational format do they take it to have? There are a number of reasons to think that our propositional attitude mindreading treats the propositional attitudes as linguistic in format.Footnote 8 As I use the term, a linguistic format is one making use of the following: logical devices, including quantifiers and connectives; a syntactic distinction between subjects and predicates; and representational elements which are arbitrarily related to their referents (Beck 2018). Our ascriptions of propositional attitudes assume that they have complex features they could plausibly have only if they represented linguistically.

First, we take it that individuals can entertain nested contents, as noted above, as well as highly abstract thoughts and thoughts involving quantification. In fact, we assume that the range of possible thoughts is vast: one can think almost anything about anything.Footnote 9 So, our mindreading assumes that the format of propositional attitudes has expressive power to match the range of possible thoughts. Language has this expressive range, in virtue of the features mentioned above – the arbitrary relationship between signs and denotations, the use of logical devices and so on. Other familiar formats, as will become clear, are far less expressively powerful. Second, we take the propositional attitudes to stand in certain logical relations and to collectively give rise to behavior in virtue of these relations. So, our mindreading assumes that the propositional attitudes have a format with the right kind of formal structure to manifest these logical connections. Language, again, has the appropriate formal structure; other familiar formats do not. In this connection, Bermúdez (2003b, 2011, 2018) argues that propositional attitude mindreaders must treat the attitudes as linguistic, since only language has the formal structure required to make the logical relations between thoughts perspicuous.Footnote 10

Although we take propositional attitudes to be linguistic in format, there is no reason to think that this is the only way a creature might represent the mental states of others. One might treat some or all representational states as having a non-linguistic representational format. Of course, if the states one is representing in this way are propositional attitudes, this would mean failing to capture certain important features of those states – that they can have nested contents, for instance. Indeed, as should already be clear, treating representational states as exclusively non-linguistic in format would have a significant impact on the mental state ascriptions one could make, and the ease or difficulty with which one could make them. As a result, returning to the question with which I began, from the fact that great apes seem to represent representational states as such, it would be far too quick to conclude that their mindreading is substantially the same as our own.

3 Mindmapping

To illustrate this, I consider in this section what it would be to ascribe states with a map-like format, rather than a linguistic one. To begin with, consider the comparison between a language and what I shall call a ‘basic map’.Footnote 11 I use this term to refer to a representational format exploiting a lexicon of syntactically simple icons denoting objects and properties, and the combinatorial principle of spatial isomorphism. According to this principle, placing two icons in a spatial relation on a map has the effect of representing that the referents of those icons stand in an isomorphic spatial relation in reality (Camp 2007, p. 158).

Both languages and basic maps are combinatorial representational formats, in which discrete representational units are combined according to combinatorial rules. The semantic content of the resulting complex representations is a systematic function of the semantic content of the units and the way in which they have been combined (Camp 2007, p. 154). But maps and language make use of very different combinatorial rules. In language, the combinatorial principles are abstract, relying on no physical similarity between the representation and what is represented. Take the sentence ‘Boris is wise’, whose combinatorial principle is predication. On a Fregean view, predication amounts to functional application: ‘terms fill the argument-places of a predicate that carries their denotations into a truth-value’ (Rescorla 2009a, p. 177). In this case, inserting ‘Boris’ into the argument-place of ‘is wise’ represents that Boris instantiates wisdom, and delivers ‘true’ just in case he does. Functional application is thus used to represent property instantiation, but there is no semantically significant physical similarity between property instantiation and the syntactic mechanism used to represent it (Camp 2007, p. 157). By contrast, the principle of spatial isomorphism exploited by maps imbues the spatial properties of a representation themselves with semantic significance.

The result of employing such different combinatorial principles is that these formats differ in characteristic ways. In some ways, basic maps are more efficient and user-friendly than languages. First, they are informationally dense. Reproducing the representational content of a basic map in sentential form is a difficult task, requiring a large and unwieldy set of sentences. Second, they wear their implications on their sleeves. For instance, if a map of the United States shows that Seattle is north of Portland and west of Spokane, it will also explicitly represent that Portland is south-west of Spokane, that Spokane is east of Portland, and so on. This information will be cognitively transparent, because it is explicitly represented in the map. By contrast, the sentence ‘Seattle is north of Portland and west of Spokane’ implies all that information, but does not explicitly represent it; someone using this sentence must do some work to figure out the rest. Finally, an effect of this is that making any alterations to a basic map is simple: moving any one icon on the map automatically updates all the represented relations between the represented objects, with the map’s coherence being automatically maintained (Camp 2007, pp. 160–162).

These advantages come at the cost of flexibility. What a representational format can represent is determined by its combinatorial principle: it can represent things as standing in the relation represented by that principle. The combinatorial principles in language stand for very general relations like instantiation, which can relate almost anything; as a result, language has almost unlimited expressive power. By contrast, the combinatorial principle of a basic map is ‘semantically robust’: it heavily constrains which contents a map can express (Camp 2009, pp. 120–121). Whenever two icons are combined according to this principle, what is expressed is always something about the spatial relation between their referents.

Using only the resources of a basic map – that is, a lexicon of syntactically simple icons and the principle of spatial isomorphism – one cannot represent non-spatial relations between objects and properties such as x is two years old, y is heavier than z, or o instantiates p. Of course, if a suitably expansive lexicon is used, these resources can nevertheless go a long way. For instance, if a lexicon included distinct, syntactically simple icons for ‘green apple’ and ‘red apple’, a map could discriminate between there being a green apple at a location and there being a red apple at that location. But this would not amount to representing that an object has a non-spatial property – since the icons do not have formally separable elements corresponding to the object and the property. To put it another way, a map tokening syntactically simple red-apple and green-apple icons would not thereby represent that two things of the same kind were in the mapped region.

There are also limits on the quantificational information that can be represented in basic maps. For example, they cannot represent bare existential quantification. That is, they cannot say simply that something exists, without saying anything further about it. A map can only represent that something exists by placing an icon somewhere on the map – and so, by representing that it exists in a particular location, and stands in certain spatial relations to other objects (Camp 2007, p. 165). Representing identity on a map is another challenge. Whilst language permits us to say, ‘Bruce is Batman’, a map of Gotham City which had a Bruce icon and a Batman icon could not do the same. We might co-locate the Bruce and Batman icons, but the map itself provides no resources for representing that these two icons stand for one and the same person, rather than two co-located individuals.

The constraints I have been discussing affect what I’ve been calling ‘basic maps’. As Camp (2007) argues, maps can be augmented with notation and conventions which increase their expressive power, so these constraints do not apply absolutely to anything which we might be inclined to call a map. These more powerful, augmented formats effectively borrow representational resources from other formats, and are, in that sense, hybrid. One version of a hybrid map might employ syntactically complex symbols to represent non-spatial properties and relations. For instance, the colour of an icon might be used to represent the colour of its referent. The resulting icon would be syntactically complex, because it would have formally separable elements – colour and shape – each having semantic significance (Camp 2007, p. 166, n. 30). Additional notation could also support the representation of object identities. For instance, we might say that where two co-located symbols are linked by a ‘=’ sign, they represent a single object. However, these additional resources make hybrid maps less cognitively transparent, and less easy to update. Just as efficiency in basic maps is bought in the coin of expressive power, expressive power in hybrid maps is bought in the coin of efficiency.

Now, having outlined these comparisons between a linguistic format and the basic map format, let us introduce the term ‘mindmapper’ to denote a creature who represents representational states exclusively as having a basic map format. Rather than taking a belief, say, to have a linguistic content, a mindmapper takes a belief to be a ‘map of the neighbouring space by which we steer’ (Ramsey 1931).Footnote 12 In other respects, let’s say, their picture of belief is similar to ours – that is, they take a belief to be a state which aims accurately to represent the world, and they expect others to act as though their beliefs are true. The difference is simply that they take beliefs to represent in a map-like rather than sentence-like way.

Since maps make a certain kind of information explicit at the cost of expressive power, mindmapping would give rise to a distinctive and limited set of mindreading abilities. In some ways, mindmapping would have its advantages. As Bermúdez (2003b, 2018) has argued, beliefs do not often give rise to behavior individually, and if one is usefully to ascribe more than one, one must be capable of recognizing the connections between them. By representing someone not as having a number of discrete sententially structured beliefs, but as having a single map-like representation of their environment, one might capture in a single ascription what a propositional attitude mindreader would treat as a number of distinct beliefs. Moreover, since maps make explicit the spatial relations holding between any objects they represent, this would take some of the cognitive work out of determining the relations between these beliefs.

As well as this, updating belief ascriptions would be somewhat simpler for a mindmapper. Suppose a mindmapper, Nila, represents a target, Robin, as having the map-like belief in Fig. 3. The map represents Nila and Robin in their shared environment, separated by a river. A food item, represented by the star icon, is represented as being on Robin’s side of the river. Now suppose that Nila moves the food over to her side of the river, and she knows that Robin was watching as she did so. To update her representation of Robin’s belief, Nila moves the food icon to the relevant location on the map (Fig. 4). As well as updating the represented location of the food, this automatically updates the represented spatial relations between the food and everything else on the map and maintains the coherence of the map. By contrast, if Nila were a propositional attitude mindreader, ascribing beliefs with a linguistic format, things would be more complicated. The sentence which gave the food’s location would need to be updated, and a further process would need to check this revised sentence for consistency with any other beliefs ascribed to Robin. This additional processing introduces a risk of error where none exists for a mindmapper.

Fig. 3
figure 3

Nila's representation of Robin's map-like belief about the relative locations of Nila, Robin, a river and a food item

Fig. 4
figure 4

Nila's updated representation of Robin's belief

However, the range of mental state ascriptions a mindmapper can make is constrained by the expressive limitations on maps, outlined above. As a mindmapper, Nila should be able to ascribe mental states whose content concerns the spatial arrangement of objects and properties, but not those whose content concerns other properties and relations. So, she can ascribe a belief about the location of some food, say – but since she treats beliefs as having a basic map format, she cannot ascribe beliefs about other properties of the food, such as its colour or how long it’s been there.

She will also be unable to ascribe beliefs involving bare existential quantification. So, for instance, suppose Nila moves the food to a new location, without Robin noticing. Nila takes Robin to have a belief whose content is expressed by the map in Fig. 4 – which is now inaccurate. But now suppose that Nila observes Robin go to the place where she (Robin) believes the food to be located, and discover that there is no food there. Nila will now have to revise her belief ascription, since Robin no longer believes that there is food at that location. The most obvious result would be to ascribe a belief whose content is expressed by the map in Fig. 5. There is no food icon on this map, because at this point, let’s say, Robin may have no idea where the food is. Even if she did, Nila has no idea where Robin thinks it is, and so no reason to ascribe a belief to Robin about the food’s location. Unless a food icon is placed on the map, the map no longer represents that any food exists.Footnote 13 Yet Robin may well believe that the food still exists. It’s just that she has no idea where it is. Because maps cannot capture this sort of bare existential quantification, Nila’s belief ascription in this situation will be inadequate.

Fig. 5
figure 5

Nila's updated representation of Robin's belief, showing no food icon

Finally, Nila will be unable to represent beliefs with content of the form x = y. So, suppose Nila has an alter-ego, Cleo, and Robin does not initially know that they are one and the same. Robin might come to believe, falsely, that Nila is in one place and Cleo in another. Nila might ascribe to Robin a belief with the content expressed by Fig. 6. But now, suppose that Robin learns the truth: that Nila and Cleo are identical. Nila will be unable to ascribe this belief to Robin, since a basic map cannot express that two icons co-refer. Nila’s only options would be to ascribe a belief with the content expressed in Fig. 7, from which one of the icons is removed, or Fig. 8, on which the icons are co-located. The first has the advantage of correctly capturing Robin’s belief about the overall number of objects – but does not at all capture what it is she has learned. The second comes closer to capturing what was learned, but incorrectly ascribes to Robin a belief that there is one more object than there is.Footnote 14

Fig. 6
figure 6

Nila's representation of Robin's false belief, showing Cleo and Nila in different locations

Fig. 7
figure 7

A possible representation of Robin's belief, once she has learned that Cleo and Nila are identical. This shows no icon for Cleo, and so fails to capture Robin's belief that Cleo and Nila are identical

Fig. 8
figure 8

An alternative representation of Robin's belief. This captures Robin's belief that Cleo and Nila are co-located, but not that they are identical

In this section, I’ve been describing what it would be to represent others’ beliefs as having a map-like format, rather than a linguistic one. I’ve argued that this would provide an effective means of capturing others’ beliefs about the spatial distribution of objects, but otherwise would be severely limited. My purpose in making this argument has been to illustrate that the format one takes representational states to have substantially affects the range of mental state attributions one can make, and the ease or difficulty with which one can make them. So, for any representational format – including the augmented map formats I mentioned briefly above – we might describe a distinctive variety of mindreading which results from taking representational states to have that format.

There is consequently, within the category of metarepresentational mindreading, significant scope for variation. In fact, this variation may be reflected in our own mindreading. Although in §2, I argued that humans must take the propositional attitudes to represent linguistically, given the vast range of logically structured possible contents we take them to have, there may be situations in which it suits us to model the contents of others’ mental states in another way – for instance, circumstances in which we are only concerned with a specific subset of somebody’s thoughts, or where behaviour can be effectively predicted without reference to a thought’s logical place in a web of beliefs and desires. This might be so in situations requiring on-the-fly anticipation or manipulation of another’s movements – in the context of hunting or sport, perhaps. Here, we might only be interested in beliefs about the spatial layout of an environment, and be able to predict behaviour reliably on the basis of these beliefs alone. In such a situation, it might suit us to treat another’s beliefs as though they were map-like. A map might model their mental state only imperfectly and partially – but in this context, any limitations might be less significant than the benefits of using a format so well-suited to capturing beliefs of this kind. So, although we must take propositional attitudes to be linguistic, insofar as we take them to have the features discussed above, we may not treat them as such in all mindreading contexts.Footnote 15

One consequence of the argument here, then, is that human mindreading may be more pluralistic than has been supposed: as well as using strategies besides propositional attitude ascription (Andrews 2012), we may adopt various strategies when ascribing propositional attitudes themselves. Another, though, is that there may be creatures whose mindreading treats representational states exclusively as non-linguistic. So, from the fact that a creature represents representational states as such, we cannot conclude that its mindreading abilities do not differ substantially from our own.

4 Are Great Apes Mindmappers?

So far, I have been defending the general claim that there are varieties of metarepresentational mindreading falling short of the ‘full-blown’ propositional attitude mindreading of adult humans. I’ve argued that more limited forms of mindreading result from taking representational states exclusively to have a non-linguistic representational format, since the expressive limitations of such formats impose constraints on the mindreading tasks one can perform. In this final section, I consider the implications of this argument for the particular case of great ape mindreading. I argue that, consistently with the evidence, great apes may be mindmappers, but that this is a hypothesis which further empirical work could rule out. I argue that this is a project worth undertaking, and offer some suggestions about how this might be done.

The best evidence that great apes represent representational states as such comes from a pair of false belief tests in which they are faced with an agent who has a false belief about the location of an object (Buttelmann et al. 2017; Krupenye et al. 2016). Since in both cases the belief ascribed concerns the location of an object, it is a belief which could be ascribed by either a mindmapper or a ‘linguistic mindreader’ –one who, at least sometimes, treats mental states as linguistic. So, the results of these studies are consistent with the hypothesis that great apes are mindmappers. Similarly, much of the evidence that great apes and other animals ascribe knowledge is drawn from tasks involving the ascription of knowledge about the location of a food item (Bugnyar 2011; Clayton et al. 2007; Hare et al. 2001; Marticorena et al. 2011). This kind of test cannot discriminate mindmappers from linguistic mindreaders, since both should be able to ascribe states with this kind of content. So, these results are consistent with an explanation in terms of either linguistic mindreading or mindmapping.

Naturally, my point is not that great apes are mindmappers. Rather, it is that this is an area in which our evidence underdetermines the theoretical possibilities: we do not know what format great apes take mental states to have. That great apes are mindmappers is therefore an epistemically open possibility, one which further empirical work might rule out. Of course, one might reasonably ask at this point whether this is a possibility which is worth ruling out. Not every possibility that is consistent with a body of evidence ought to be taken seriously. I propose, though, that there are a number of reasons to take this possibility seriously.

First, there is some limited empirical support for the idea that great apes are mindmappers. Current evidence indicates that great apes are incapable of level-two visual perspective taking (Karg et al. 2016). Level-two visual perspective taking is the ability to represent not merely which objects another individual can see,Footnote 16 but how those objects look to her – and to appreciate that the very same objects may look different from her perspective than they do from one’s own (Flavell et al. 1981; Salatas and Flavell 1976). Level-two visual perspective taking is often taken to be closely connected to the false belief test – both rely on appreciating that another person can have ‘a mistaken perspective’ (Karg et al. 2016). So, it might be surprising that great apes have passed false belief tests, whilst failing level-two visual perspective taking tasks.

The mindmapping hypothesis can provide an explanation of this pattern. As I mentioned briefly in §2, the idea of mindmapping can be applied to content-bearing states other than belief – so, a mindmapper might represent the content of visual perception as map-like. A creature who did this would be capable of level-one, but not level-two visual perspective taking – because a basic map can represent where things are, but not what they look like.

To see this, consider a situation in which a mindmapper M and her target T sit on opposite sides of a cube, as in Fig. 9. Face A, visible only to M is green; face B, visible only to T, is red. M can represent that T sees the cube, by ascribing to her a perceptual state whose content is expressed by the map in Fig. 10, in which a cube-shaped icon, representing cubes, is placed at the relevant location. But she can’t represent that T sees this cube as red. This would require something syntactically more complex – an icon denoting the cube, which could be placed at the relevant location to represent where the cube was, and which could be modified in some other way to represent its being red. A basic map lacks these resources. Note that, even if the map’s lexicon contained a syntactically simple icon for red cubes, placing it on the map would not amount to level-two visual perspective taking. Suppose instead of taking the cube-shaped icon in Fig. 10 to represent cubes, we instead treat it as a syntactically simple red-cube icon. If this icon is equivalent to a denoting term having red cubes as its extension, then by using this map M simply represents, again, that T can see the cube at this location – but this does not speak to how she sees it. If, on the other hand, we treat the icon as a predicate – equivalent to ‘red-cubeness’ – then by using this map M will represent that T sees red-cubeness instantiated here. But this leaves it an open question whether T sees this as a property of the very same object M is looking at. Level-two visual perspective taking, though, involves recognising that I and another see the very same object in different ways.

Fig. 9
figure 9

A mindmapper M and a target T sit on opposite sides of a cube. Face A is visible only to M. Face B is visible only to T

Fig. 10
figure 10

M's representation of T's perceptual state, representing the relative locations of M, T and the cube

Despite this inability to engage in level-two visual perspective taking, though, a mindmapper can nevertheless ascribe some false beliefs – those whose contents can be ascribed in a map like format, which includes those beliefs ascribed in standard false belief tests. So, the hypothesis that animals are mindmappers explains this aspect of the evidence concerning great ape mindreading.

Second, there is already good reason to think that non-human cognition exploits a variety of representational formats. For instance, it’s been suggested that animals might represent dominance hierarchies using a diagrammatic format (Camp 2009); that they navigate using map-like representations (see Rescorla 2018); that imagistic representations may underpin their physical reasoning (Gauker 2018); and that they represent quantities using analogue magnitude representations (Beck 2015). Michael Tomasello (2014) has proposed that great ape cognition exclusively employs a system of iconic representation which ‘prefigures’ the linguistic representation exploited by human thought. Against this theoretical landscape, it does not seem far-fetched to suppose that great apes might take others’ mental states to have a map-like format, rather than a linguistic one. And on the flipside, there is no prima facie reason to suppose that they should take representational states to have a linguistic format.

Finally, the alternatives to taking this possibility seriously are unattractive. We might simply suspend belief about what format great apes take representational states to have. But this would impair our ability to predict which mental states they could ascribe, and would leave us with a picture of great ape mindreading which was in an important respect incomplete. Alternatively, we might conclude that great apes take representational states to have a linguistic format, as we do. But given the connection between taking mental states to have a linguistic format and representing complex features of the propositional attitudes, this would involve ascribing to great apes an understanding of the mind considerably more sophisticated than the evidence mandates. This would be in violation of Morgan’s Canon – the methodological principle admonishing us not to explain animal behavior in terms of a more complex psychological capacity when a simpler one will do (Morgan 1894). Of course, that principle is rightly controversial (Fitzpatrick 2008; Sober 2005). But this move would also violate Evidentialism – the principle that we should not endorse an explanation of an animal’s behavior in terms of a cognitive process X if our evidence gives us no reason to prefer it to an explanation in terms of another process Y (Fitzpatrick 2008).

Whether great apes are mindmappers is a question we might begin to answer by implementing mindreading tasks which depart from the simple object-location paradigm mentioned above. States whose contents concern objects’ non-spatial features are the most obvious to consider. But to show that a subject ascribes a belief about non-spatial properties is more difficult to establish than it might initially appear. Consider the following study conducted with children. Subjects observe as an experimenter demonstrates for the benefit of a target that an object makes a rattling noise. The experimenter asks the target ‘can you do that?’ and offers her a choice between two further objects. Children expect that the target will choose the object that looks similar to the first, rattling object, even though the children know, having been familiarised with the objects, that it is the other (dissimilar) object which rattles (Scott et al. 2010). This is taken to show that the children can ascribe beliefs about non-spatial properties. But mindmappers could pass this test by placing a ‘rattle’ icon on the map modelling the target’s belief, thereby ascribing the belief ‘there is a rattle at that location’, rather than ‘that object is a rattle’. Using a suitably expansive lexicon of syntactically simple icons, a mindmapper could make many of the same behavioural predictions as a linguistic mindreader who ascribes beliefs about the non-spatial properties of objects.

A task with a temporal dimension, in which an object’s properties change over time, might be more effective. Here is an illustrative example. In a study on episodic memory in great apes, subjects observed two food items being hidden. One of the items was a grape; the other a piece of frozen juice, which they preferred to the grape, but which would melt after a certain amount of time. Then, after either five minutes or an hour, they were given the opportunity to choose one of the hiding spots. They chose the frozen juice significantly more after five minutes, and the grape significantly more after an hour (Martin-Ordas et al. 2010). This indicates, among other things, that they know that over time, the frozen juice melts and becomes unobtainable. Now we might approach the question whether apes are mindmappers by considering whether they attribute knowledge of that kind to others – knowledge that frozen juice melts and becomes unobtainable. One way to explore this would be to investigate how well apes predict the behaviour of other individuals who are either ignorant or knowledgeable about the behaviour of frozen ice. A linguistic mindreader should be capable of representing the difference between these individuals: one knows, and the other does not, that frozen ice melts. A mindmapper should not, since a basic map cannot represent that something has non-spatial properties which change over time.

We might also investigate whether apes can ascribe beliefs about objects whose location the target individual does not represent – things like ‘the ball is red’. A modified false belief test using anticipatory looking could be used for this. Suppose, in familiarisation, a subject watches as an agent A plays with a ball, repeatedly pressing a button on the surface which changes its colour from red to yellow and back again. A occasionally puts the ball away in a box, at which point she is presented with a red and a yellow button. She presses the yellow button if she thinks that the ball is yellow, and the red button if she thinks that the ball is red.

In the test trial, things proceed in the same way – except that after putting the ball in the box, A leaves the room. Whilst she is gone, an experimenter, E, enters the room. E takes out the ball and changes its colour, before replacing it in the box. As a result, A has a false belief about the ball’s colour. Subjects are now divided into two conditions. In the box-present condition (Fig. 11), E leaves with the box, immediately bringing it back. He then leaves, empty-handed. At this point, A returns to find the room apparently just as she left it, and is presented with the buttons. In the box-absent condition (Fig. 12), E leaves with the box, immediately returning without it. He then leaves again. A returns, finds the box missing, and is presented with the buttons. Anticipatory looking should reveal whether subjects in each condition expect A to press the red or the yellow button.

Fig. 11
figure 11

Box-present condition. (1) A places ball in box, before (2) leaving the room. (3) E enters, changes the ball’s colour and replaces it. (4) E leaves with the box, but immediately brings it back. (5) E leaves. (6) A returns, finding the box where she left it, and is presented with the two buttons

Fig. 12
figure 12

Box-absent condition. (1) A places ball in box, before (2) leaving the room. (3) E enters, changes the ball’s colour and replaces it. (4) E leaves with the box, immediately returning emptyhanded. (5) E leaves. (6) A returns, finding the box missing, and is presented with the two buttons

The critical question is whether the presence or absence of the box makes a difference. Assuming the task demands are not overwhelming, a linguistic mindreader ought to anticipate that A will press the button not corresponding to the present colour of the ball. But a mindmapper ought to anticipate this only in the box-present condition. In the box-absent condition, she should be indifferent – since A does not know or represent the location of the ball, and mindmappers cannot ascribe beliefs about objects without ascribing beliefs about their location.

Of course, there are limits to the type of investigation I’m proposing. Tests of this kind might conclusively establish the negative result that a subject was not a mindmapper. But no such test could conclusively establish the positive result that a subject was a mindmapper. Anything within the capacities of a mindmapper should also be possible for a mindreader using a more powerful representational format. As such, there is unlikely to be a behavioural test on which a positive performance is conclusive evidence of mindmapping. Conversely, there will be no single test in which failure is conclusive evidence of mindmapping, since subjects might fail a test for many reasons, including excessive task demands. Consequently, it may be that no single test could definitively diagnose a subject as a mindmapper. Nevertheless, the question whether great apes are a mindmappers is an empirically tractable one. It is a question which cannot be answered using a single test – but what matters is the possibility of a battery of tests revealing a pattern of behaviour best explained by the mindmapping hypothesis.

One might object that even such a rich pattern of evidence might be multiply interpretable, because just as maps can be extended in various ways, languages can be arbitrarily restricted. By restricting the available predicates in a language’s lexicon, we might arrive at a representational format with precisely the same expressive power as a basic map, but which would nevertheless be a language, in virtue of employing linguistic combinatorial principles. Thus, even if we discovered a creature whose mindreading was limited to the ascription of mental states with contents carrying concrete information about spatial arrangements, this evidence would support two hypotheses equally well. Either the creature uses a basic map format to model mental states, or they use an arbitrarily restricted language. Consequently, any pattern of behaviour indicative of mindmapping is also consistent with some version of linguistic mindreading.Footnote 17

In reply, we might dispute whether such an arbitrarily restricted form of linguistic mindreading would be empirically equivalent to mindmapping – since, as already noted, maps and language differ in respects other than expressive power. But even if this claim is granted, it should not be concerning, since fitting the data is not the only theoretical virtue. There’s a natural understanding of the nature of cartographic representation from which the distinctive pattern of efficiency and limitation discussed in this paper unfolds. In the context of mindmapping, it readily and independently gives rise to empirical predictions. But there is no natural understanding of a language exhibiting this same pattern. Any empirically equivalent linguistic mindreading hypothesis must be produced by constructing a language in an ad hoc fashion to fit the pattern maps produce. It will give rise to certain predictions only because it has been designed to give rise to precisely those predictions, and has no independent predictive power. Being ad hoc and lacking in predictive power are both theoretical vices. So, even if there were a situation in which the evidence did not conclusively tell between a mindmapping hypothesis and a restricted linguistic mindreading hypothesis, it might nevertheless be reasonable to favour one over the other.

5 Conclusion

In this paper, I have been arguing that mindreaders might vary with respect to the format they take representational states to have. This is significant, since the format one takes mental states to have imposes substantial constraints on one’s mindreading abilities. I have illustrated this by considering the possibility of ‘mindmapping’ – that is, taking mental states to have the format of a basic map. I showed that this would result in a form of mindreading which was constrained by the expressive power of a basic map. In brief, it would provide an effective way to represent mental states whose contents concern the spatial arrangement of objects and properties, but would not support the ascription of mental states concerning non-spatial properties, bare existential quantification, or identity. The same idea might be applied more generally: for any format, we can describe a characteristic form of mindreading which is the result of taking mental states to have that format. So, the argument here reveals a vast space for theorising about how forms of mindreading might differ.

As well as highlighting an interesting sense in which our own mindreading may be pluralistic, this has concrete significance for the emerging picture of great ape mindreading – since it demonstrates that, from the fact that great apes pass false belief tests, we cannot conclude that great ape mindreading and human mindreading are substantially the same. Consistently with what we know, great apes could be mindmappers. If they are, then their mindreading abilities will be considerably less powerful than our own, in a systematic and predictable way. I have not, however, argued that great apes are mindmappers; rather, I have been arguing that the evidence underdetermines what format they take mental states to have. Further empirical work could support taking a stand one way or the other on this question. This work, I have argued, is worth undertaking. Given that one’s mindreading abilities are in part determined by the format one takes mental states to have, any story about animal mindreading which is silent on that question will be incomplete.