Elsevier

Cognition

Volume 70, Issue 2, 1 March 1999, Pages 191-210
Cognition

Active and passive scene recognition across views

https://doi.org/10.1016/S0010-0277(99)00012-8Get rights and content

Abstract

Recent evidence suggests that scene recognition across views is impaired when an array of objects rotates relative to a stationary observer, but not when the observer moves relative to a stationary display [Simons, D.J., Wang, R.F., 1998. Perceiving real-world viewpoint changes. Psychological Science 9, 315–320]. The experiments in this report examine whether the relatively poorer performance by stationary observers across view changes results from a lack of perceptual information for the rotation or from the lack of active control of the perspective change, both of which are present for viewpoint changes. Three experiments compared performance when observers passively experienced the view change and when they actively caused the change. Even with visual information and active control over the display rotation, change detection performance was still worse for orientation changes than for viewpoint changes. These findings suggest that observers can update a viewer-centered representation of a scene when they move to a different viewing position, but such updating does not occur during display rotations even with visual and motor information for the magnitude of the change. This experimental approach, using arrays of real objects rather than computer displays of isolated individual objects, can shed light on mechanisms that allow accurate recognition despite changes in the observer's position and orientation.

Introduction

Real-world object and scene recognition faces a fundamental problem: the retinal projection of the environment changes whenever the observer or objects in the environment move. Changes to the relative positions of the observer and objects can lead to size and orientation changes in the retinal projection of the environment. Yet our visual system somehow finds stability in these changing images. Two distinct approaches to achieving stability across view changes have been proposed in the literature. The system may selectively encode features of the scene that are invariant to perspective changes and use those features in object and scene recognition. For example, we may represent the object-centered spatial relationships among the parts of an object. Alternatively, our system may employ some transformation rules to compensate for changes in the retinal projection, thereby providing a common basis for comparing two different views. For example, we may mentally rotate an object until it is aligned with a previous representation or we could interpolate between two or more views to recognize objects from new perspectives.

Research on object recognition across views has provided some support for each of these possibilities. For example, Biederman and colleagues (Ellis et al., 1989; Biederman and Cooper, 1991, Biederman and Cooper, 1992; Cooper et al., 1992; see also Bartram, 1976) used a priming paradigm and measured the response latency to name line drawings of familiar objects. In their studies, the amount of priming was unaffected by changes in the retinal size of the object from study to test (scaling invariance). Furthermore, naming latency was impervious to changes to the position of the object in the visual field and to the object's orientation in depth. Biederman and Gerhardstein (1993)showed similar orientation invariance when observers were asked to match individual shapes (Geons), name familiar objects, and classify unfamiliar objects.

In contrast, many other studies suggest that object recognition performance is view-dependent; recognition accuracy and latency differ as the test views deviate from the studied view (e.g. Shepard and Metzler, 1971; Shepard and Cooper, 1982; Rock et al., 1989). With wire-frame or blob-like objects in same-different judgment tasks (Bülthoff and Edelman, 1992; Tarr, 1995; Tarr et al., 1997), subjects typically show fast, accurate recognition for test views within a small distance of the studied view and impaired performance for novel views. Furthermore, the impairment seems to be systematically related to the magnitude of the difference between studied and tested views, particularly for changes to the in-depth orientation of an object. The greater the rotation in depth away from the studied view, the longer the response latency (see also Tarr and Pinker, 1989). Such findings necessarily imply that object representations are viewer-centered.

Another critical piece of evidence in support of viewer-centered representations is that when two or more views of the same object are provided at study, subjects subsequently generalize to intermediate views but not to other views (Bülthoff and Edelman, 1992; Kourtzi and Shiffrar, 1997). A number of models of object recognition have attempted to account for this finding by positing mechanisms that operate on viewer-centered representations. For example, linear combinations of 2D views (Ullman and Basri, 1991) and view approximation (Poggio and Edelman, 1990; Vetter et al., 1995) are both consistent with these data. However, in order to interpolate between two or more views, the initial views must first be linked to the same object. That is, subjects must recognize that the same object is being viewed in the first and second studied views even though those views differ. It is unclear from these models how this initial matching is accomplished, particularly if the views are relatively far apart and the object is not symmetrical (see Vetter and Poggio, 1994). Although these models may not fully account for the nature of object recognition for novel views, the empirical data seems to support the claim that representations of individual objects are view-dependent.

Both view-independent and view-dependent models of object recognition seem to capture some aspects of how the visual system accommodates view changes. For example, when the learning period is relatively long and the object is relatively complicated and difficult to name, recognition may rely on viewer-centered representations. On the other hand, when objects are made of distinct parts whose spatial relationship can be coded easily, and when the task concerns more abstract knowledge such as naming or classification, recognition may rely on view-independent representations. Nevertheless, studies comparing these models typically test recognition for isolated objects and they ignore extra-retinal information that is available in real-world object recognition. Thus, neither model is likely to explain all aspects of object representation.

Section snippets

Recognition of object arrays

Recently, several laboratories have begun to consider the recognition of more complex, naturalistic displays (e.g. spatial layouts of objects) across views. Spatial layout representations are important for a number of reasons. First, most real-world object recognition occurs in the context of other objects rather than in isolation. Thus, it seems reasonable to study spatial layout representations to gain a clearer picture of the sorts of representations we might need from one view to the next.

A hint from spatial reasoning studies

Although studies of spatial layout recognition are closer to real-world recognition, most have neglected an important source of information that may be central to real-world object and scene recognition. In real environments, observers have available many sources of information in addition to the retinal projection of the scene. For example, they have visual, vestibular, and proprioceptive information for their own movements. Such extra-retinal information may specify the magnitude of a change

Scene recognition in real world

Despite evidence that imagined observer and display rotations lead to differences in performance, only recently has work in object and scene recognition considered this difference. Studies of object recognition have relied exclusively on display rotations to study view changes. This neglect of observer movement can be traced to the assumption that equivalent retinal projection changes should produce equivalent mental transformations of the visual representation. Because the retinal projection

Mechanisms of updating

Studies of navigation have shown that extra-retinal information can be used in updating one's own position. Spatial representations of position and orientation rely on vestibular signals (e.g. Israel et al., 1996), proprioceptive and kinesthetic cues (e.g. Loomis et al., 1993; Berthoz et al., 1995), optical flow (Ronacher and Wehner, 1995; Srinivasan et al., 1996), magnetic fields (Frier et al., 1996), and energy expenditure (Kirchner and Braun, 1994). By using one or more of these sources of

Experiment 1

This experiment served as a replication of earlier work comparing orientation and viewpoint changes (Simons and Wang, 1998), and tested the possibility that the availability of additional visual information would allow updating during orientation changes. Observers viewed layouts of real objects on a rotating table and were asked to detect changes to the position of one of the objects. We examined performance on this task across both shifts in the observer viewing position and rotations of the

Method

The apparatus was the same as in Experiment 1. Eleven undergraduates participated in the study in exchange for $7 compensation. Unlike Experiment 1, observers remained at the same viewing position for all 40 trials of this experiment. On each trial, they viewed the array for 3 s (Study period) and then lowered the curtain. During the 7 s delay interval, the table rotated by 40 degrees. For half of the trials, the experimenter rotated the table (as in Experiment 1) and for the other half, the

Experiment 3

In this experiment, observers sat on a wheeled chair and were rolled by an experimenter from the Study position to the Test position. If updating of the viewer-centered representation requires active control over the viewpoint change, observers should be less accurate when they are passively moved. By comparing performance in this experiment to the corresponding active-movement condition in Experiment 1, we can access the effect of active movement on the updating process.

General discussion

When observers remain in the same position throughout a trial, they are better able to detect changes when they receive the same view at study and test. In striking contrast, when observers move to a novel viewing position during a trial, they detect changes more effectively when they receive the corresponding novel view than the studied view. That is, they are better able to detect changes when the orientation of the table is constant throughout a trial, even if that means they will experience

Acknowledgements

The authors contributed equally to this research and authorship order was determined arbitrarily. Thanks to Daniel Tristan and Chris Russell for help collecting the data and to M.J. Wraga for comments on an earlier version of the paper. Some of this research was presented at ARVO 1998.

References (59)

  • I Rock et al.

    Can we imagine how objects look from other viewpoints?

    Cognitive Psychology

    (1989)
  • M.J Tarr et al.

    Mental rotation and orientation-dependence in shape recognition

    Cognitive Psychology

    (1989)
  • C Thinus-Blanc et al.

    The spatial parameters encoded by hamsters during exploration a further study

    Behavioural Processes

    (1992)
  • F Xu et al.

    Infants' metaphysics: the case of numerical identity

    Cognitive Psychology

    (1996)
  • C Acredolo et al.

    The role of self-produced movement and visual tracking in infant spatial orientation

    Journal of Experimental Child Psychology

    (1984)
  • M.A Amorim et al.

    Updating an object's orientation and location during nonvisual navigation: a comparison between two processing modes

    Perception and Psychophysics

    (1997)
  • D.J Bartram

    Levels of coding in picture-picture comparison tasks

    Memory and Cognition

    (1976)
  • J.B Benson et al.

    Effect of self-initiated locomotion on infant search activity

    Developmental Psychology

    (1985)
  • A Berthoz et al.

    Spatial memory of body linear displacement: what is being stored?

    Science

    (1995)
  • I Biederman et al.

    Size invariance in visual object priming

    Journal of Experimental Psychology: Human Perception and Performance

    (1992)
  • I Biederman et al.

    Recognizing depth-rotated objects: evidence and conditions for three-dimensional viewpoint invariance

    Journal of Experimental Psychology: Human Perception and Performance

    (1993)
  • H.H Bülthoff et al.

    Psychophysical support for a two-dimensional view interpolation theory of object recognition

    Proceedings of the National Academy of Sciences of the United States of America

    (1992)
  • Christou, C.G., Bülthoff, H.H., 1997. View-Direction Specificity in Scene Recognition After Active and Passive...
  • E.E Cooper et al.

    Metric invariance in object recognition: a review and further evidence

    Canadian Journal of Psychology

    (1992)
  • V.A Diwadkar et al.

    Viewpoint dependence in scene recognition

    Psychological Science

    (1997)
  • R Ellis et al.

    Varieties of object constancy

    Quarterly Journal of Experimental Psychology

    (1989)
  • M.J Farrell et al.

    Mental rotation and the automatic updating of body-centered spatial relationships

    Journal of Experimental Psychology: Learning, Memory, and Cognition

    (1998)
  • H.J Frier et al.

    Magnetic compass cues and visual pattern learning in honeybees

    Journal of Experimental Biology

    (1996)
  • R Held et al.

    Neonatal deprivation and adult rearrangement: Complementary techniques for analyzing plastic sensory-motor coordinations

    Journal of Comparative and Physiological Psychology

    (1961)
  • Cited by (0)

    View full text