Research Focus
After the viewpoint debate: where next in object recognition?

https://doi.org/10.1016/j.tics.2003.08.004Get rights and content

Abstract

A debate on whether object representations capture isolated viewpoints or ranges of views has dominated research in object recognition in recent years, but now seems to be waning. Rather than assume a narrow model in which either structural or view information is used to recognize an object, researchers have begun to examine how these properties might be used cooperatively. A recent paper by Foster and Gilson confirms sensitivity to both types of information, which combine in an additive framework to predict recognition performance.

Section snippets

The viewpoint debate

In the 1990s, two patterns of behavioural performance emerged in studies of object recognition, each of which suggested support for one of the theoretical alternatives. One set of studies found invariance across image manipulations such as scale and contour deletion (e.g. [2]) and later across changes in viewpoint 3, 4. These studies supported structural description models of object recognition. On the other hand, a variety of researchers found that recognition of objects seemed to depend upon

Incorporating structure and view information in one model

Foster and Gilson created a set of simple objects composed of cylinders connected end-over-end (see Fig. 1). The objects were differentiated from each other on the basis of one of four properties; number of parts, part curvature, part length, and angle of join between parts. The first property is non-accidental, and thus provides information about object structure; the other properties, however, are view-specific, in that their depiction in an image will change radically across views. The

Post-debate studies of viewpoint

Foster and Gilson are not alone in considering the relative contributions of structural and view-specific information to the object recognition process. In particular, Stankiewicz [12] has recently shown that particular 3-D shape properties (specifically, axis curvature and aspect ratio) and an object's viewpoint can be estimated independently of each other. Although Stankiewicz argues for a different relationship between these properties, the fact that these studies and others (e.g. [17])

References (24)

  • M.J. Tarr et al.

    Is human object recognition better described by geon-structural-descriptions or by multiple-views?

    J. Exp. Psychol. Hum. Percept. Perform.

    (1995)
  • I. Biederman et al.

    Viewpoint-dependent mechanisms in visual object recognition: Reply to Tarr and Bülthoff (1995)

    J. Exp. Psychol. Hum. Percept. Perform.

    (1995)
  • Cited by (62)

    • Long-term Object Discrimination at Several Viewpoints Develops Neural Substrates of View-invariant Object Recognition in Inferotemporal Cortex

      2018, Neuroscience
      Citation Excerpt :

      Rotation in depth makes object features change drastically from one view to another. However, we can recognize an object despite changes in viewing angle (Tarr and Bulthoff, 1995; Bar and Ullman, 1996; Logothetis and Sheinberg, 1996; Edelman and Keller, 1998; Biederman and Bar, 2000; Hayward, 2003; Kersten et al., 2004; Palmeri and Gauthier, 2004; Pinto et al., 2008; Poggio and Ullman, 2013). Object discrimination depends on inter-object similarity.

    • On the partnership between neural representations of object categories and visual features in the ventral visual pathway

      2017, Neuropsychologia
      Citation Excerpt :

      There are a number of reasons for why a feature-based categorical code should be considered the default hypothesis when it comes to the representation of categories in the ventral stream. First, virtually all theories of visual object recognition hold that coding for object categories, while viewpoint-dependent, also achieving a measure of viewpoint invariance, or “tolerance”, with respect to transformations of orientation, illumination, distance, and position in the visual field (Biederman, 2000; Hayward, 2003; Peissig and Tarr, 2007). One can for example readily recognize that an object is an animal (e.g. cat) or tool (e.g. a hammer), even though viewing angle, lighting, and viewing distance might vary considerably.

    • Flexible visual processing of spatial relationships

      2012, Cognition
      Citation Excerpt :

      This mechanism solves the problem of relation detection, but does not meet our requirement of flexibility, because it requires an existing representation for every possible configuration of every pair of objects. There is debate over whether such systems would cause an unrealistic combinatorial explosion of existing representations for the recognition of single objects (Biederman, 1987; Hayward, 2003; Hummel, 2000, in press; Tarr & Bulthoff, 1998), but this problem would be compounded for relations among multiple objects, which must consider combined identities of two objects, not to mention the angle between them. Despite such pessimism, this mechanism does almost certainly exist for inflexible processing of some simple and frequently encountered relations that merit efficient long-term representations (see Section 9).

    • Multiple Grasp-Specific Representations of Tool Dynamics Mediate Skillful Manipulation

      2010, Current Biology
      Citation Excerpt :

      Our results can be related to hypotheses regarding the mechanisms of visual object recognition. Specifically, it has been suggested that the visual system uses either single or multiple representations to solve the problem of viewpoint invariance for object recognition [33–35]. Evidence for multiple viewpoint-specific representations comes from studies in which subjects learn to visually recognize novel objects.

    View all citing articles on Scopus
    View full text