Elsevier

Cognition

Volume 114, Issue 1, January 2010, Pages 72-88
Cognition

The embodied nature of spatial perspective taking: Embodied transformation versus sensorimotor interference

https://doi.org/10.1016/j.cognition.2009.08.015Get rights and content

Abstract

Humans are able to mentally adopt the spatial perspective of others and understand the world from their point of view. We propose that spatial perspective taking (SPT) could have developed from the physical alignment of perspectives. This would support the notion that others have put forward claiming that SPT is an embodied cognitive process. We investigated this issue by contrasting several accounts in terms of the assumed processes and the nature of the embodiment. In a series of four experiments we found substantial evidence that the transformations during SPT comprise large parts of the body schema, which we did not observe for object rotation. We further conclude that the embodiment of SPT is best conceptualised as the self-initiated emulation of a body movement, supporting the notion of endogenous motoric embodiment. Overall our results are much more in agreement with an ‘embodied’ transformation account than with the notion of sensorimotor interference. Finally we discuss our findings in terms of SPT as a possible evolutionary stepping stone towards more complex alignments of socio-cognitive perspectives.

Introduction

As a social species, humans are highly skilled in the perception and representation of their conspecifics. This encompasses understanding of simple actions and body postures, such as a hand outstretched for greeting, but also more sophisticated understanding of intentions, such as determining whether somebody is lying or telling the truth. While the former processes have been associated with automatic matching mechanisms without awareness, the latter processes are usually subsumed under the label of “theory of mind” and require conscious understanding of others (see Frith & Frith, 2007, for a recent review).

In this research we investigated how humans mentally adopt someone else’s spatial perspective. While this is a conscious and deliberate process, it is still a quite basic form of inferring other people’s representations of the world. Nevertheless it could be an important stepping stone from automatic and unaware perception of others towards more sophisticated forms of ‘mind reading’. For instance, similar expressions in several languages use spatial perspective taking as a metaphor for more sophisticated socio-cognitive perspective sharing, e.g. “I understand your point of view”, “Put yourself in my position”, etc. While this potentially important role in our individual and cultural development remains speculative at this stage, spatial perspective taking (SPT) is an essential process in every day communication and cognition. Consider the following example where we are facing a friend and would like to tell her that there is an eyelash on one of her cheeks (e.g. her left, which would be right from our viewpoint). If we wish to make it easy for our friend then we would mentally place ourselves in her perspective to tell her on which side the eyelash is (“left” in this case). But how do we accomplish such understanding? How do we overcome the differences in body orientations and related perspectives of the world?

In fact most people find it quite hard to mentally adopt another viewpoint and research over the past decades has shown that the speed (and accuracy) of SPT decreases with the angular disparity between the egocentric and the target viewpoint (Huttenlocher and Presson, 1973, Kozhevnikov and Hegarty, 2001, Levine et al., 1982, Zacks and Michelon, 2005, for a recent review). Accordingly, it has been suggested that SPT is subserved by a mental rotation of the self (e.g. Graf, 1994, Keehner et al., 2006, Kessler, 2000, May, 2004, Wraga et al., 2005, Zacks and Michelon, 2005). In contrast to the ability to mentally rotate objects (OR) (Shepard & Metzler, 1971), humans seem to adopt somebody else’s spatial perspective by mentally rotating themselves into their orientation, which seems to involve a different cognitive operation than object rotation (Hegarty and Waller, 2004, Kozhevnikov and Hegarty, 2001, Kozhevnikov et al., 2006, Zacks and Michelon, 2005). Kozhevnikov et al. (2006) showed that SPT but not OR performance predicted navigational skills that involved self-to-object relations (e.g. finding short-cuts and pointing to occluded objects). Kozhevnikov and Hegarty (2001) reported a dissociation between the mental abilities for rotating objects versus adopting someone else’s perspective although the two processes seemed to be correlated in their setup (also Hegarty & Waller, 2004). Mental self-rotation has been repeatedly reported to be less effortful (faster/more accurate) than object rotation (OR) within the ground plane (Keehner et al., 2006, Wraga et al., 1999, for a review; Wraga et al., 2005, Zacks and Michelon, 2005, for a review) and that discontinuities are observed with SPT but not with OR. That is, processing time for SPT remains fairly constant at low angles but there is a ‘jump’ around 60°–90° angular disparity where reaction times suddenly start to increase with angle (e.g. Graf, 1994, Keehner et al., 2006, Kozhevnikov and Hegarty, 2001, Michelon and Zacks, 2006). In contrast, OR shows a continuous increase already at low angular disparities (e.g. Graf, 1994, Keehner et al., 2006, Michelon and Zacks, 2006, Shepard and Metzler, 1971) but in return seems to dependent less on the plane of rotation (e.g. Zacks & Michelon, 2005).

This difference in susceptibility to the plane of rotation suggests that the two processes could be related to different spatial frames of reference. While SPT relies on an egocentric frame, OR implies an allocentric or intrinsic referential frame (Kozhevnikov and Hegarty, 2001, Kozhevnikov et al., 2006, Wraga et al., 1999). The former encodes object locations in relation to the observer’s body orientation, while the latter encodes objects in relation to the environment, i.e. to other objects (and potentially to their intrinsic orientation, e.g. Levelt, 1996). Egocentric encoding could be a first hint towards embodied representations, since the egocentric system has been suggested to be responsible for guiding body movements in space, hence, providing an embodied frame of reference for mental transformations (Kozhevnikov et al., 2006).

If it was indeed the case that SPT involves some sort of “rotation of the self” then it would be essential to understand what this “self” actually entails. For one branch of the involved research it seems to refer to the transformation of an abstract coordinate system where the observer is basically the point of origin, usually termed “origo” in linguistics and computational linguistics (e.g. Grabowski and Miller, 2000, Graf, 1994, Levelt, 1996, Moratz and Tenbrink, 2006, Retz-Schmidt, 1988, for a general overview), while on the other side of the spectrum researchers assume that ‘mental rotation of the self’ involves transformations of the internal representations that the observers possess of themselves (e.g. Arzy, Thut, Mohr, Michel, & Blanke, 2006; e.g. Blanke et al., 2005, Farrell and Thomson, 1999, Kozhevnikov et al., 2006, May, 2004, Presson and Montello, 1994, Rieser, 1989). This latter research assumes that SPT is grounded in the internal representations of our body (i.e. body schema) and that the required cognitive transformations are therefore ‘embodied’. Note that in the context of SPT adopting another perspective is sometimes termed “disembodiment” since participants have to imagine themselves outside their own body (e.g. Blanke et al., 2005, Klatzky et al., 1998, Tversky and Hard, 2009). Here we generally term SPT as being embodied - also when adopting another viewpoint – in the sense that we claim (and provide evidence) that SPT is heavily rooted in representations of the body and its movement repertoire. We use the term “embodied” in analogy to “embodied perception” and “embodied semantics” associated with representations partially implemented by the motor and somatosensory system (e.g. Fischer & Zwaan, 2008).

With respect to embodiment, OR has been shown to be modulated by concurrent movements of the hands (Wohlschlager & Wohlschlager, 1998). With congruent movements OR is processed faster than with incongruent movements suggesting an overlap between object transformations and action-related representations of hands. Sack, Lindner, and Linden (2007) reported even stronger embodiment of OR in case body parts (hands) had to be mentally rotated. This is in line with the so-called direct-matching hypothesis (Wohlschlager, Gattis, & Bekkering, 2003) and its assumed implementation by the mirror neuron system (e.g. di Pellegrino et al., 1992, Kessler et al., 2006, Keysers and Perrett, 2004, Rizzolatti and Craighero, 2004; but see Jonas et al., 2007), which proposes a direct activation of the observer’s motor repertoire by the mere observation of an action. For OR this is supported by neuroimaging results where motor areas of the brain were found to be involved during both types of OR, but more strongly during hand- than abstract cubes rotations (e.g. Kosslyn et al., 1998, Wraga et al., 2003).

Amorim, Isableu, and Jarraya (2006) went a step further in their behavioural experiments and compared OR of abstract cube configurations (cf. Shepard & Metzler, 1971) to OR of full bodies in various postures. Based on their results Amorim et al. (2006) suggested the notion of motoric embodiment as an integral part of the mental rotation of objects that happen to be bodies. Such motoric embodiment enables a smooth mental rotation of a visually perceived body by emulating the transformation/rotation of the perceived body within the sensorimotor system of the observer. This is in agreement with the direct-matching hypothesis and explains why rotations of bodies are significantly more efficient than rotations of the classic S–M cubes and, importantly, why bodies displaying impossible postures loose this advantage (Amorim et al., 2006).

However, to be able to embody a displayed body posture for rotating it into a target posture one would have to mentally adopt the starting posture to begin with. Amorim et al. (2006, p. 344) indeed hint at this pre-stage by stating that the starting posture would have to be emulated (motorically embodied) to begin the rotation process. Such posture emulation, however, has been suggested as a form of SPT (cf. Zacks, Mires, Tversky, & Hazeltine, 2000) where observers mentally rotate/transform their body into the target posture. We therefore expected that SPT in general would incorporate elements of motoric embodiment. This assumption is supported by neuroimaging results that implicated motor and motor-related areas as an integral part of processing during SPT. While Zacks and Michelon (2005) concluded that posterior frontal motor areas are involved in both, object- and self-rotation (see Vogeley et al., 2004, for similar findings re SPT), Wraga et al. (2005) suggested that object rotation was based on motor-representations that reflected manipulation (pre- and primary motor areas), whereas self-rotation was rather based on proprioceptive and perceptual information (fusiform gyrus, insula). Nevertheless, Wraga et al. (2005) also reported supplementary motor area activation during self-rotation, which suggests a certain amount of motor involvement during SPT. Note that while these neuroimaging results reveal task-related activation changes in sensorimotor brain areas, the exact role of such activations during the process of SPT is unclear. Therefore, the embodied nature of SPT still remains speculative and evidence for a direct link between SPT and own and perceived body postures and movements is still largely amiss. We aimed at closing this gap by means of the series of behavioural experiments presented here.

In particular we hypothesised that the postulated motoric embodiment of SPT would involve different body representations than OR, which we tested by comparing Experiments 2 (SPT) and 3 (OR). OR seems to be either related to the internal representation of the hands that humans usually employ to manipulate objects (Carpenter et al., 1999, Kosslyn et al., 1998, Sack et al., 2007), or in the case of bodies and body parts OR seems to be related to the corresponding posture and movement representations ‘mirrored’ in the observer (Amorim et al., 2006, Kosslyn et al., 1998, Sack et al., 2007, Wraga et al., 2003). SPT on the other hand could be related to body representations that are employed during physical alignment of perspectives, i.e. when we actually move/rotate into another point of view. Especially at higher angular disparities such physical perspective changes involve a turn of the whole body and we expected these parts of the body schema to be the basis of SPT.

This latter consideration also suggests that the notion of posture emulation as the primary embodied mechanism of SPT (as discussed above) could be too closely related to the direct-matching hypothesis, where a visually perceived action or posture is directly emulated within the observer. Such a conception would always rely on exogenous visual input to resonate with the observer’s action and posture repertoire. We therefore suggest referring to this form as ‘exogenous’ motoric embodiment. In contrast we claim that conscious and intentional cognitive processing can rely on embodied transformations that are self-initiated. This could be the emulation of a movement that is already within the repertoire – like rotating the body into a new orientation – which could directly support the cognitive process in question. We propose to refer to this form as ‘endogenous’ motoric embodiment and suggest that it is the emulation of a movement in contrast to the more perceptually-based ‘exogeneous’ motoric embodiment referring to the emulation of a visually perceived posture. We further expected SPT to strongly rely on endogenous motoric embodiment since we propose that SPT is the emulation of a body rotation to physically align perspectives.

In the context of the spatial updating research the assumption that the body schema is largely involved in SPT has recently even led to a re-interpretation of angular disparity effects in terms of sensorimotor interference (e.g. May, 2004, Riecke et al., 2007, Wang, 2005, Wraga, 2003).1 According to this account disparity effects do not occur because of an increased cognitive effort of the mental transformation, but instead, are induced by an increasing conflict between the mentally rotated head direction and the available contradictory proprioceptive information (May, 2004). Several findings have been reported to support this notion: Firstly, the updating effort is much reduced if blindfolded participants actually move/rotate into their new orientation and not only imagine the perspective change (Farrell and Thomson, 1999, May and Wartenberg, 1995, Presson and Montello, 1994, Rieser, 1989; but see Wraga, 2003), thus, suggesting a process that strongly relies on proprioceptive information and on automatic embodied updating (Riecke et al., 2007). Secondly, disorienting participants by turning them in circles until they loose their orientation in relation to the environment improves pointing speed and accuracy, suggesting that disorientation relieves participants from interference between imagined and actual orientation (May, 1996).

While these two findings generally support an involvement of sensorimotor representations, a third result imposes a more direct challenge for the transformation account. May (2004) and Wang (2005) employed a spatial updating task where they provided participants in advance with the information about the required perspective change and with enough time for the participants to mentally adopt this perspective prior to the target object being disclosed (to which they had to point from their new perspective). The crucial challenge for the transformation account was that preparation time did not obliterate the effect of angular disparity, which should have been the case as participants were given the time to calculate the transformation in advance, hence, leaving only sensorimotor interference as a possible explanation (May, 2004, Wang, 2005). Although the experimental manipulations are elegant and the conclusions compelling, we would like to point out that the cognitive load introduced by the number of potential targets in the object arrays has been neglected so far. Our point is that the difficulty for updating an object array is a direct function of the number of objects (Wang et al., 2006). May (2004) and Wang (2005) used quite complex arrays consisting of 4 and 5 objects respectively. If participants would have used their extra time to mentally rotate themselves AND update the object array before knowing the target object they would have had to maintain all 4/5 objects and their updated locations in relation to the rotated self within working memory – which is costly, especially as one must assume that the orientation of the rotated self is maintained in working memory as well. We propose that it was much easier for the participants to either ‘do nothing’ or conduct SPT only (without updating the 4 or 5 object locations), wait until the target object was indicated, and then update the representation of this specific object. This particular issue can only be resolved by manipulating the number of objects in addition to providing preparation time.

Here we employed a setup with only 2 objects and we manipulated the body schema itself, which allowed comparing the predictions of the transformation and the interference accounts without the potential confound of enhanced working memory load. In contrast to the effective but somewhat coarse disorientation approach (May, 1996) we used different body postures to systematically vary the amount of sensorimotor congruence or conflict in addition to mere angular disparity (Fig. 1B). Since the general evidence for embodiment of SPT is compelling, a ‘pure’ transformation account in form of an abstract coordinate system transformation (e.g. Retz-Schmidt, 1988) is highly unlikely to be the appropriate approach2. However, if one assumes that the mental self-rotation entails a transformation of parts of the body schema into a virtual body posture in form of a movement emulation (see above), then sensorimotor information should have an influence in addition to a cognitive effort that increases with angular disparity. Accordingly, if SPT primarily transforms body schema representations, then a physical body posture that is already congruent with the direction of mental rotation provides the transformation process with a computational ‘head-start’ as it is already turned into the correct direction (compare Fig. 1B).

The difference between the two accounts (sensorimotor interference vs. embodied transformation) now lies in their predictions of how an embodiment effect would change with increasing angular disparity. The embodied transformation account assumes that the congruent body posture provides a ’head-start’ which remains constant over angles. That is, the body is already partially turned in the correct direction, thus, decreasing the amount of necessary movement emulation. Since the angle of the participant’s physical posture change was constant in all our experiments this head-start or directional priming should always be the same, disregarding the angular disparity for SPT.

In contrast, the interference account predicts a ‘best match’ effect where the angular disparity that provides the ‘best match’ between proprioceptive information and mentally transformed perspective should reveal the most efficient processing. In fact the difference between the two accounts boils down to whether sensorimotor congruence/conflict is expected to have a stronger impact than pure angular disparity (sensorimotor interference) or vice versa (embodied transformation) and whether one expects a sensorimotor conflict at the beginning of SPT (embodied transformation) or after (sensorimotor interference).

If SPT was indeed the endogenous emulation of a body rotation then we would expect body posture effects (congruent vs. incongruent) to be optimally revealed when the process of mental self-rotation is actually employed. This seems to be the case when the mental effort for SPT abruptly starts to increase at higher angular disparities. Specifically, Kessler (2000) suggested in concordance with the discontinuities around 60°–90° (e.g. Graf, 1994, Keehner et al., 2006, Kozhevnikov and Hegarty, 2001, Michelon and Zacks, 2006), that a simple visual matching process could be performed at low angles, while actual mental self-rotation commences at angles above 60°–90°. This is congruent with Kozhevnikov and Hegarty’s (2001) report that for angles below 100° participants seemed to employ a different processing strategy than SPT, which was reflected by the observation that participants sometimes turned their head to “get a better view” while avoiding to mentally rotate themselves. A visual matching process can be conducted at low angles because the target perspective is still largely aligned with the egocentric perspective. Especially left/right judgements can usually be performed quite easily this way because the target’s left and right still largely overlap with the observer’s left and right – as can be seen in Fig. 1A at 40° angular disparity, where the flower is still clearly left of the gun without a mental self-rotation being necessary. Since we expected that motoric embodiment of SPT would be directly related to the process of mental self-rotation in form of endogenous movement emulation, body posture effects should therefore only appear at higher angles. This still leaves the question open whether sensorimotor congruence/incongruence would have a stronger impact than angular disparity (sensorimotor interference account) or vice versa (embodied transformation account) during mental self-rotation.

In a series of four experiments we aimed to reveal whether SPT relies on motoric embodiment. Furthermore we wanted to understand how these results would relate to OR and we expected qualitatively different embodiment patterns for the two processes. We also investigated whether the angular disparity effects in SPT were due to sensorimotor interference (e.g. May, 2004, Riecke et al., 2007, Wang, 2005, Wraga, 2003) or due to the increasing effort for embodied transformations. We tested an amended form of the basic transformation account which assumes that parts of the body schema serve as the representational basis for the transformation (i.e. embodied transformation account), which in turn is best conceptualised as the self-initiated emulation of a body rotation. In this context we expected motoric embodiment effects to appear at higher angular disparities, strongly depending on whether the process of mental self-rotation would actually be employed to solve the task. Finally we investigated whether SPT would incorporate exogenously triggered posture emulation in addition to self-initiated movement emulation.

Section snippets

Experiment 1

We aimed to unravel the embodied nature of SPT. To this end we took pictures of an avatar sitting at a round table at various degrees of angular disparity (Fig. 1A). Participants were instructed to adopt the spatial perspective of the avatar and make an object selection from that viewpoint. So far this was a classical setup for a perspective alignment task, where we expected reaction times to increase more strongly at angles over 60°–90° (e.g. Graf, 1994, Keehner et al., 2006, Kozhevnikov and

Experiment 2

In this second experiment we removed the avatar from the scene, replacing it with an empty chair (see Fig. 3). An emulation of a visually perceived body posture was no longer possible. Previous research has clearly shown that SPT can be performed without an avatar being present (e.g. May, 2004, Michelon and Zacks, 2006), but crucially, would the embodiment effect also persist? If this was the case we would gain novel insights into the nature of the motoric embodiment of SPT. Firstly, it would

Experiment 3

We aimed to show that the observed motoric embodiment effect is SPT specific and does not occur in the same form in relation to mental object rotations (Shepard & Metzler, 1971). That is, while OR seems to involve representations of hands which humans usually employ to manipulate objects (Sack et al., 2007, Wohlschlager and Wohlschlager, 1998), we claim that SPT involves whole body representations that are involved in posture changes to physically align viewpoints.

To investigate OR we employed

Experiment 4

As discussed in the context of Experiment 2, we were able to show that motoric embodiment persists in the absence of an avatar, i.e. without the option to match a perceived body onto the internal body schema. We therefore concluded that a large part of the embodiment effect could be related to action emulation (endogenous), but we also pointed out that an additional exogenously triggered effect that would generate a direct match between the perceived body posture and the repertoire of the

Low versus high rotation angles: two mechanisms for SPT

First of all we were able to replicate previous findings showing an increase in the cognitive effort for performing SPT with increasing angular deviation between the egocentric and the target perspective. We also replicated the classic pattern for object rotation (OR) with a continuous increase of processing time with angular deviation. However, the increase for SPT was not monotonic as effort started to augment significantly above 40° or even 80°, which is also in agreement with previous

Acknowledgements

We would like to thank an anonymous reviewer, Maria Kozhevnikov, and Jeff Zacks for essential comments on an earlier version of the manuscript. We would also like to thank Clare Allely and William J. Corral for their help with data collection. This research was supported by ESRC/MRC funding (RES-060-25-0010) to KK.

References (68)

  • B. Tversky et al.

    Embodied and disembodied cognition: Spatial perspective-taking

    Cognition

    (2009)
  • M. Wraga et al.

    The influence of spatial reference frames on imagined object- and viewer rotations

    Acta Psychologica (Amsterdam)

    (1999)
  • M. Wraga et al.

    Imagined rotations of self versus objects: An fMRI study

    Neuropsychologia

    (2005)
  • M. Wraga et al.

    Implicit transfer of motor strategies in mental rotation

    Brain and Cognition

    (2003)
  • M.A. Amorim et al.

    Embodied spatial transformations: “Body analogy” for the mental rotation of objects

    Journal of Experimental Psychology: General

    (2006)
  • S. Arzy et al.

    Neural basis of embodiment: Distinct contributions of temporoparietal junction and extrastriate body area

    Journal of Neuroscience

    (2006)
  • A.P. Bayliss et al.

    Predictive gaze cues and personality judgments: Should eye trust you?

    Psychological Science

    (2006)
  • O. Blanke et al.

    Linking out-of-body experience and self processing to mental own-body imagery at the temporoparietal junction

    Journal of Neuroscience

    (2005)
  • J. Brauer et al.

    All great ape species follow gaze to distant locations and around barriers

    Journal of Comparative Psychology

    (2005)
  • J. Brauer et al.

    Making inferences about the location of hidden food: Social dog, causal ape

    Journal of Comparative Psychology

    (2006)
  • J. Call et al.

    Domestic dogs (Canis familiaris) are sensitive to the attentional state of humans

    Journal of Comparative Psychology

    (2003)
  • P.A. Carpenter et al.

    Graded functional activation in the visuospatial system with the amount of task demand

    Journal of Cognitive Neuroscience

    (1999)
  • S.H. Chatterjee et al.

    Configural processing in the perception of apparent biological motion

    Journal of Experimental Psychology – Human Perception and Performance

    (1996)
  • K.R. Coventry et al.

    Saying, seeing and acting: The psychological semantics of spatial prepositions

    (2004)
  • M.L. Davidson

    Univariate versus multivariate tests in repeated-measures experiments

    Psychological Bulletin

    (1972)
  • G. di Pellegrino et al.

    Understanding motor events: A neurophysiological study

    Experimental Brain Research

    (1992)
  • M.J. Farrell et al.

    On-line updating of spatial information during locomotion without vision

    Journal of Motor Behavior

    (1999)
  • Fischer, M. H., & Zwaan, R. A. (2008). Embodied language: A review of the role of the motorsystem in language...
  • J. Grabowski et al.

    Factors affecting the use of dimensional prepositions in German and American English: Object orientation, social context, and prepositional pattern

    Journal of Psycholinguistic Research

    (2000)
  • R. Graf

    Self-rotation and spatial reference: The psychology of partner-centred localisations

    (1994)
  • M. Jonas et al.

    Do simple intransitive finger movements consistently activate frontoparietal mirror neuron areas in humans?

    Neuroimage

    (2007)
  • K. Kessler

    Spatial cognition and verbal localisations: A connectionist model for the interpretation of spatial prepositions

    (2000)
  • R.L. Klatzky et al.

    Spatial updating of self-position and orientation during real, imagined, and virtual locomotion

    Psychological Science

    (1998)
  • G. Knoblich et al.

    The social nature of perception and action

    Current Directions in Psychological Science

    (2006)
  • Cited by (0)

    View full text