Review
Situating vision in the world

https://doi.org/10.1016/S1364-6613(00)01477-7Get rights and content

Abstract

Recently, there has been a great deal of interest in what has been called ‘situated cognition’, which has included claims that certain forms of representation are inadequate for modeling active organisms or agents such as humans and robots. In this article, I suggest that a weakness in classical theories of visual representation is the way in which representations connect with the real world, which may account for many of the concerns expressed by the situated cognition community. Specifically, I claim that what current theories lack is any provision for a certain form of direct, preconceptual connection between objects in the visual world (visual objects or proto-objects) and their representations in the visual system. This type of connection is akin to what philosophers and semanticists have referred to as an ‘indexical’ or ‘demonstrative’ reference and what some cognitive scientists have referred to as ‘deictic pointers’. I explain why such a mechanism is needed and suggest that many workers have, in fact, been studying precisely this under the term ‘visual index’. The visual index hypothesis is illustrated with the results of some relevant experiments, including multiple object tracking, visual routines and subset-selected visual searches. Indexing theory provides a synthesis that has profound implications for explaining a wide range of psychophysical findings, certain results in infant cognitive development and also some ancient problems in the philosophy of mind.

Section snippets

Demonstrative reference

Using what I have been referring to as demonstrative references avoids the need to encode a scene exhaustively in terms of absolute or global properties and can instead refer to certain relations between the objects and the perceiver/actor. This simplifies certain kinds of planning by providing information in an optimal form for making decisions about actions. Box 1 illustrates three possible ways in which a robot might represent an environment through which it must navigate. It demonstrates

Multiple object tracking

Experimental research on visual indexes started with the following experiment (Fig. 1). Eight identical circles appear on a screen and four of them flicker briefly. Subsequently, all eight circles begin to move randomly on the screen and continue to do so for about ten seconds, after which they stop moving. The observer’s task is to keep track of the four circles that initially flickered (but are now identical to the other circles) and to identify them at the end of the trial. The only special

Other evidence of visual indexes

A basic assumption of visual indexing theory is that the visual system has a way of selecting and accessing a small number of visual objects without having to use a description. If this is true, then establishing indexes for several objects should allow an observer to select them rapidly by following pointers provided by indexes, without having to search for an object that fits a description. In addition to the multiple object tracking studies described above, two lines of evidence from my

Object-file theory

Danny Kahneman and co-workers37 showed in several experiments that individual objects (as opposed to their locations or other properties) provide the locus for storing and accessing various properties associated with those objects. They made use of the well-known ‘priming effect’, whereby the prior occurrence of a particular letter decreases the recognition time for that letter. Kahneman et al.37 showed that the priming effect for a letter traveled with the box in which it had occurred (Fig. 3

Conclusions

I have argued that the visual system (and perhaps also the cognitive system) needs a special kind of direct reference mechanism to refer to objects without having to encode their properties. Thus, on initial contact, objects are not interpreted as belonging to a certain type or having certain properties; in other words, objects are initially detected without being conceptualized. This kind of direct reference is provided by what is referred to as a demonstrative, or more generally, an

Acknowledgements

I wish to thank Jerry Fodor and Brian Scholl for their contributions to the ideas contained in this article. The research reported here was supported in part by NIMH Grant 1R01-MH60924 awarded to the author.

References (54)

  • C.R. Sears et al.

    Multiple object tracking and attentional processes

    Can. J. Exp. Psychol.

    (2000)
  • J. Burkell et al.

    Searching through subsets: a test of the visual indexing hypothesis

    Spat. Vis.

    (1997)
  • D.G. Watson et al.

    Visual marking: prioritizing selection for new objects by top-down attentional inhibition of old objects

    Psychol. Rev.

    (1997)
  • Z.W. Pylyshyn et al.

    Developing a network model of multiple visual indexing

    Invest. Ophthalmol. Vis. Sci.

    (1994)
  • J.M. Henderson et al.

    Roles of object-file review and type priming in visual identification within and across eye fixations

    J. Exp. Psychol. Hum. Percept. Perform.

    (1994)
  • F. Xu et al.

    Infants’ metaphysics: the case of numerical identity

    Cognit. Psychol.

    (1996)
  • T.S. Horowitz et al.

    Visual search has no memory

    Nature

    (1998)
  • A. Clark

    An embodied cognitive science?

    Trends Cognit. Sci.

    (1999)
  • P.A. Agre et al.

    Pengi: an implementation of a theory of activity

  • H.A. Simon

    The Sciences of the Artificial

    (1969)
  • K. n et al.

    Goal reconstruction: how TETON blends situated action and planned action

  • J.K. O’Regan

    Solving the ‘real’ mysteries of visual perception: the world as an outside memory

    Can. J. Psych.

    (1992)
  • Y. Lespérance et al.

    Indexical knowledge and robot action: a logical account

    Artif. Intell.

    (1995)
  • Z.W. Pylyshyn

    The role of location indexes in spatial perception: a sketch of the FINST spatial-index model

    Cognition

    (1989)
  • Z.W. Pylyshyn

    Some primitive mechanisms of spatial attention

    Cognition

    (1994)
  • R. Reynolds

    Perception of an illusory contour as a function of processing time

    Perception

    (1981)
  • A.B. Sekuler et al.

    Visual completion of partly occluded objects: a microgenetic analysis

    J. Exp. Psychol. Gen.

    (1992)
  • Cited by (142)

    • Mental files: Developmental integration of dual naming and theory of mind

      2020, Developmental Review
      Citation Excerpt :

      Mental files theory (Recanati, 2012) provides the conceptual tools and a cognitive structure to answer these questions. Mental files theory (Murez & Recanati, 2016) aims to unify research in which the notion of a mental file has been used under different guises: “notion” or “file” in philosophy (Perry, 2002; Strawson, 1974), “discourse referent” in linguistics (Karttunen, 1976), “object file” in psychological research on attention (Kahneman, Treisman, & Gibbs, 1992), object search (Treisman, 1982), and object tracking (Pylyshyn, 2000); short term memory and object individuation in infancy (Xu & Carey, 1996; Scholl & Leslie, 1999). A mental file can be characterized by the following features (based on Recanati, 2012):

    • Artificial consciousness and the consciousness-attention dissociation

      2016, Consciousness and Cognition
      Citation Excerpt :

      Attention can also operate on things in the world that display object-like properties, such as cohesion, symmetry, and common fate (for reviews, see Chen, 2012; Scholl, 2001). Object-based attention requires a two-stage process that begins with the (generally automatic) individuation of objects (Pylyshyn, 2000). Selective attention operates upon these “indexed” items in order to bind object features, which are made available through feature maps (Treisman & Gelade, 1980), resulting in sustained object-based mental representations that allow object identification.

    View all citing articles on Scopus
    View full text