Elsevier

Cognition

Volume 63, Issue 1, 1 April 1997, Pages 29-78
Cognition

Salience of visual parts

https://doi.org/10.1016/S0010-0277(96)00791-3Get rights and content

Abstract

Many objects have component parts, and these parts often differ in their visual salience. In this paper we present a theory of part salience. The theory builds on the minima rule for defining part boundaries. According to this rule, human vision defines part boundaries at negative minima of curvature on silhouettes, and along negative minima of the principal curvatures on surfaces. We propose that the salience of a part depends on (at least) three factors: its size relative to the whole object, the degree to which it protrudes, and the strength of its boundaries. We present evidence that these factors influence visual processes which determine the choice of figure and ground. We give quantitative definitions for the factors, visual demonstrations of their effects, and results of psychophysical experiments. © 1997 Elsevier Science B.V.

Introduction

When you glance at a fan or a futon and recognize it, you do so with apparent ease. In fact, however, during that glance billions of neurons labor in concert to transform, step by step, the shower of photons hitting each eye into recognized objects such as fans and futons. The ease of recognition, like the ease of an Olympic skater, is deceptive. Recognizing futons from photons is no small task.

Indeed it is not one task, but many. Color, shading, shape, motion, texture, and context are all typically used in the process. It is natural then for the theorist, faced with this complexity, to choose a strategy of divide and conquer. And fortunately, as Fig. 1 shows, there are natural ways to divide the problem of recognition. Notice that the silhouettes in this figure are easily recognized. But there is no color, shading, motion, or texture. Nor is context of any help; your present location, the time of day, and other contextual factors could not help you to predict what is in the figure. The figure contains only shape, and that of a restricted type, namely silhouettes.

Thus, in many cases, shape alone permits successful recognition of objects. Indeed we can recognize thousands of objects entirely by their shapes. These large numbers raise a major obstacle to successful recognition, namely indexing—efficiently searching one's memory of familiar objects to seek a best match to a given image. What Fig. 1 shows is that shape by itself is enough, in many cases, to index with success. Somehow human vision can represent silhouettes in a way that provides a useful first index into its memory of shapes. This first index is computed “bottom up” and might not be right on target, but it must be close enough so that any top-down searches it triggers can converge quickly to the right answer.

In light of these remarks we restrict attention, throughout this paper, to recognition by shape.

How is shape represented to provide a useful first index? Research to date yields few firm conclusions. However, there is growing consensus that representing shapes in terms of their parts may aid the recognition process in human vision (Baylis and Driver, 1995a, Baylis and Driver, 1995b; Bennett and Hoffman, 1987; Beusmans et al., 1987; Biederman, 1987; Biederman and Cooper, 1991; Braunstein et al., 1989; Driver and Baylis, 1995; Hoffman, 1983a, Hoffman, 1983b; Hoffman and Richards, 1984; Marr, 1977, Marr, 1982; Marr and Nishihara, 1978; Palmer, 1975, Palmer, 1977; Siddiqi et al., 1996; Stevens and Brookes, 1988; Todd et al., 1995; Tversky and Hemenway, 1984; but see Cave and Kosslyn, 1993). Parts may aid as well in computer vision (Binford, 1971; Brooks, 1981; Dickinson et al., 1992; Guzman, 1971; Pentland, 1986; Siddiqi and Kimia, 1995; Terzopoulos et al., 1987; Winston, 1975). The idea is that for you to recognize some shape in an image as, say, a cat, you must first decompose the shape into parts—for three reasons. First, cats are opaque, and second, cats can hide behind other opaque objects, as when a cat peeks from behind a chair. For both reasons you can't see all of a cat in a single glance. Thus to recognize a cat you must find and represent its parts that are visible in your image. The visible parts permit a first index into your catalogue of shapes, starting further routines which result in recognizing the cat. But the cat poses another problem. It walks, thereby moving its body nonrigidly. Here again, parts can come to the rescue. If you find the right parts of the cat, say such rigidly moving subshapes as its legs and feet, and represent them and their (changing) spatial relationships, again you might just recognize the cat. So the opacity of most objects and the nonrigidity of some makes parts a useful, perhaps essential, approach to recognition.

But one might ask, Which parts? How shall I find a cat's tail if I don't yet have a cat? We can solve this problem in two ways. We can define a priori a set of basic shapes that are the possible parts. Our task is then to find these basic shapes in images. Or we can instead define, by means of general computational rules, the boundaries between parts, that is, those points on a shape where one part ends and the next begins. Our task is then to find these boundaries in images.

Proponents of basic shapes have studied many alternatives: polyhedra (Roberts, 1965; Waltz, 1975; Winston, 1975), generalized cones and cylinders (Binford, 1971; Brooks, 1981; Marr and Nishihara, 1978), geons (Biederman, 1987), and superquadrics (Pentland, 1986). In each case the basic shapes reveal two drawbacks. They are ad hoc in origin and limited in scope. They are limited in scope since many objects are not composed solely of geons, polyhedra, superquadrics, generalized cylinders, or some combination: consider, for instance, a face or a shoe. They are ad hoc in origin because either (1) they are not derived from first principles or (2) they are derived from first principles but are not the entire set of basic shapes that follow from these principles. Polyhedra, superquadrics, and generalized cylinders make no appeal to first principles. Geons, in contrast, do appeal to the principle of “nonaccidental properties” (Witkin and Tenenbaum, 1983; Lowe, 1985). They are defined using three-dimensional (3D) features which, generically, survive under projection. Some examples are features like straight versus curved (only by an accident of view could a curve in 3D project to a straight line in an image) and parallel versus nonparallel (only by an accident of view could lines not parallel in 3D look parallel in the image). However, geons are not the entire set of primitives that follow from the principle of nonaccidental properties. Geons can only end, for instance, either in points, like a sharpened pencil, or in truncations, like a new unsharpened pencil (Biederman, 1987). They don't have rounded tips, although the distinction between tips that are rounded, pointed, and truncated is one that survives projection, and is critical to the proper recognition of fingers, toes, and peeled bananas (which have rounded tips that are not, even roughly, pointed or truncated). Thus every collection of basic shapes that has heretofore been proposed has in fact been ad hoc. Nevertheless they may be useful as qualitative descriptors of parts, rather than as an algorithm for parsing objects into parts.

Proponents of boundaries have also studied many alternatives. Rules for defining part boundaries include “deep concavities” (Marr and Nishihara, 1978), “sharp concavities” (Biederman, 1987), “concave regions” (Biederman, 1987), “limbs and necks” (Siddiqi and Kimia, 1995), and the “minima rule” (Hoffman and Richards, 1984). According to the minima rule, human vision defines part boundaries at negative minima of curvature on silhouettes, and along negative minima of the principal curvatures on surfaces. The other rules (except for limbs and necks, which we mention later) are similar in spirit but weaker in precision. The minima rule states precisely what they try to capture. Moreover converging experimental evidence suggests that human vision does in fact break shapes into parts as per the minima rule (Baylis and Driver, 1995a, Baylis and Driver, 1995b; Braunstein et al., 1989; Driver and Baylis, 1995; Hoffman, 1983a, Hoffman, 1983b; Hoffman and Richards, 1984), and that it finds parts preattentively (Baylis and Driver, 1995a, Baylis and Driver, 1995b; Driver and Baylis, 1995). You find them early and you can't stop yourself. Therefore in what follows we build on the minima rule.

Although the minima rule gives precise points for carving shapes into parts, it neither describes the resulting parts nor compares their visual salience (Hoffman and Richards, 1984). In this paper we study salience. Our plan, in brief, is as follows. Since the minima rule is our point of departure, we first review its definition. This keeps the paper self contained. We then develop rules for part salience: first in 2D, then in 3D.

What do we mean by the salience of a part? As we said before, parts help us index our memory of shapes. Their salience determines, in part, their efficacy as an index. This efficacy might be measured in reaction times, error rates, confidence ratings, and judgments of figure and ground. For instance, as we demonstrate in Section 8, a low salience part sometimes has no efficacy as an index. This can happen because human vision prefers, ceteribus paribus, to choose figure and ground so that figure has the more salient parts. Thus a less salient part is not even used as an index whenever it loses in this figure–ground competition. Our argument here follows a pattern seen many times before in the literature. It has been shown that “good” parts provide better retrieval cues for recalling shapes (Bower and Glass, 1976), that “good” parts are themselves better recalled (Palmer, 1977), and that they are more easily identified in mental images (Reed, 1974). The geometrical theory of part salience developed here provides a de facto starting definition of part salience, to be refined in the light of psychophysical experiments.

Our goal here differs in two respects from the description of shape provided by codons (Richards and Hoffman, 1985; Richards et al., 1986). A codon is a segment of a silhouette's outline which is (1) bounded on either side by a minimum of curvature and (2) assigned to one of six classes based on its maxima and inflections. A minimum of curvature can of course be either positive or negative in sign. So codons are not, in general, bounded by two negative minima of curvature and they do not, in general, correspond to parts of a shape. This is the first respect in which codons are not relevant to our undertaking here; we are studying parts, and codons do not, in general, correspond to parts. The second respect is this: the codon description is qualitative, assigning segments to one of six categories, whereas we here seek a quantitative account of the geometric factors affecting part salience.

Our goal here also differs from the interesting project of describing silhouettes in terms of causal processes (Leyton, 1987Leyton, 1988Leyton, 1989Leyton, 1992). This project uses curvature extrema and symmetry to infer a description in terms of four processes: protrusion, indentation, squashing, and internal resistance. It provides an account of the perceived genesis of shapes. We, however, seek instead quantitative factors that determine part salience.

Finally, our goal differs from the three-dimensional interpretations of silhouettes provided by Richards et al. (1987). To obtain these interpretations they use theorems regarding the geometry of smooth surfaces and their image projections, together with assumptions regarding general position and generic surfaces. Given a silhouette they find a small set of three-dimensional interpretations and decompositions into parts. However, they do not describe the salience of these parts, which is our concern here.

We view this paper as follows. We are searching a space of hypotheses about geometric factors which affect part salience. Part salience is a subject where there is as yet little empirical work to guide us. Thus our intention is to survey the territory, provide an initial map, and point to directions we think interesting. Some of the hypotheses are, we suspect, less plausible than others, and we indicate this when we discuss them. But we include them for more completeness in our survey of theoretical possibilities. We state the hypotheses precisely enough that they can be tested by psychophysical experiment. When we are done with our survey, we then select those hypotheses that we find most plausible, and present them as our theory of part salience. This theory suggests intriguing visual effects, and we close by demonstrating these effects and presenting psychophysical experiments.

Section snippets

The minima rule

Any subset of a shape could be considered one of its parts. But for the task of recognition not just any parts will do. The parts must satisfy certain principles (Hoffman and Richards, 1984; Marr and Nishihara, 1978; Sutherland, 1968). They must be computable from images (else they cannot be obtained), defined on any shape (else they will not help us recognize some shapes), and invariant under generic perturbations of viewpoint (else we shall see new parts each time we move, which would defeat

Salience in 2D: the case of cusps

Having reviewed the minima rule, we now study part salience. We start, for simplicity, with silhouettes. Moreover, to simplify even further, we assume that the only part boundaries are cusps (i.e., sharp points). Such silhouettes can be represented by plane curves that are closed (i.e., no gaps or loose ends), simple (i.e., no self intersections), oriented (i.e., traced in a definite direction), and whose only negative minima of curvature are concave cusps (which have curvature of infinite

Salience in 2D: smooth boundaries

The definitions of protrusion and relative area do not change if the part boundaries are smooth. However, the analysis of boundary strength does change, and to this we now turn our attention.

For a silhouette, each negative minimum of curvature on its bounding curve is, by the minima rule, a part boundary. The primary difference between a smooth boundary and cusp boundary is that the cusp has two normals, whereas the smooth has, by definition, only one. Therefore the hypothesis of turning

Salience in 3D: the case of concave creases

We now consider the salience of parts of 3D objects. Many ideas discussed for silhouettes still apply, but with added complexity. In this section we consider parts whose boundaries are concave creases. Later we consider parts whose boundaries are smooth.

Salience in 3D: smooth surfaces

We now consider smooth surfaces in 3D. There is not much new to be said about them over what has already been said in the other cases. The theoretical measures of protrusion and size in this case are identical to those for crease boundaries. And the theoretical measures of boundary salience follow straightforwardly from the corresponding smooth silhouette measures (viz., normalized curvature and locale turning) by extending them to the one-dimensional smooth boundaries of the 3D case. The only

A theory of part salience

We have considered some factors affecting part salience, given precise definitions of these factors, and cast them as hypotheses. We now propose a concrete theory of part salience by stating which of these hypotheses we think best (i.e., most plausible as a psychological account). The theory describes how relative size, protrusion, and boundary strength affect perceived salience. This gives a clear target for theorists to refine and experimentalists to test.

The theory has two parts: a theory

Visual demonstrations and psychophysical experiments

We have proposed a theory of the geometric factors that influence judgments of part salience. Its plausibility can be suggested by demonstrations but determined, of course, only by experiments. To such experiments and demonstrations we now turn.
EXPERIMENT 1

Consider the demonstration shown in Fig. 24(a). It consists of two variations of the Schroeder staircase shown earlier in Fig. 3. Recall that the Schroeder staircase can be seen in two different interpretations, either as a normal ascending

Subjects

The subjects were 10 students from the University of California, Irvine. The subjects were volunteers naive to the purposes of the experiment. All had normal or corrected to normal acuity.

Stimuli

The stimuli were the standard Schroeder staircase (Fig. 3) and its two modifications (Fig. 24). The staircases were viewed at a distance of about 1 m and subtended about 7 degrees of visual angle.

Design

Each staircase was shown an equal number of times upright and inverted, and with dots as depicted in Fig. 3, Fig. 24

Results and discussion

Consider again Fig. 24(b). If a subject responds “different” to this figure then we know that the subject sees the most salient negative minima as step boundaries. Thus we record the response as “most salient”. If a subject responds “same” to this figure then we record the response as “least salient”. For the version of Fig. 24(b) in which the dots are shifted by one step face, we record just the opposite: a response of “different” is recorded as “ least salient” and a response of “same” as

Subjects

The subjects were 10 students from the University of California, Irvine. The subjects were volunteers naive to the purposes of the experiment. All had normal or corrected to normal acuity.

Stimuli

The stimuli were the face–vase illustrations of Fig. 25. On each trial one such illustration was shown. The face–vase illustrations were viewed at a distance of about 1 m and each subtended about 9 degrees of visual angle.

Design

There were three face–vase illustrations. Each was shown 16 times for a total of 48

Results and discussion

The percentage, averaged over 10 subjects, of “vase” responses was 44.4% for Fig. 25(a), 25% for Fig. 25(b), and 74.4% for Fig. 25(c). (Data from an eleventh subject, who never reported the vases interpretation for Fig. 25(a), was excluded from the analysis.) A one-factor ANOVA of these means shows a significant effect of boundary strength in the predicted direction: F(2, 18)=11.57, p<.001. This supports the hypothesis that subjects choose figure and ground so that figure has the more salient

Subjects

The subjects were 10 students from the University of California, Irvine. The subjects were volunteers naive to the purposes of the experiment. All had normal or corrected to normal acuity.

Stimuli

The stimuli were three fish-scale patterns of the type shown in Fig. 26, each with its own turning ratio and area ratio. The three turning ratios were (in degrees) 120 : 120 (as in Fig. 26(a)), 120 : 90, and 120 : 60 (as in Fig. 26(b)) with corresponding area ratios of 1 : 1, 1.25 : 1, and 1.67 : 1. On each trial one such

Results and discussion

Consider again Fig. 26(b). If a subject responds that the dot is “on” the scale then we know that the subject sees the most salient negative minima as scale boundaries. Thus we record the response as “most salient”. If a subject responds “off” to this figure then we record the response as “least salient”. Similar interpretations hold, mutatis mutandis, for the versions of Fig. 26(b) with the other dot placements. We did the same for responses to the 120 : 90 figure, and for Fig. 26(a), although

Concluding remarks

We have proposed that the salience of a part depends on (at least) three factors: its size relative to the whole object, the degree to which it protrudes, and the strength of its boundaries. We have considered precise definitions for these factors, and presented visual demonstrations and psychophysical tests of their effects. And in Experiment 1 we have given evidence that part salience affects the early visual processing of figure and ground.

Assessing the salience of visual parts is one small

Acknowledgements

For discussions and suggestions, we thank Marc Albert, Bruce Bennett, Myron Braunstein, Mike D'Zmura, Ki-Ho Jeon, Jin Kim, Jeff Liter, Scott Richman, Asad Saidpour, and Jessica Turner. We especially thank Marc Albert for helping us refine our definition of part protrusion. And we thank three anonymous reviewers for helpful comments. This work was supported by National Science Foundation grant DIR-9014278.

References (90)

  • Attneave, F. (1971). Multistability in perception. Scientific American, 225(6),...
  • Bahnsen, P. (1928). Eine untersuchung ueber symmetrie und asymmetrie bei visuellen wahrnehmungen. Zeitschrift für...
  • Baylis, G.C., & Driver, J. (1995a). One-sided edge assignment in vision. 1. Figure–ground segmentation and attention to...
  • Baylis, G.C., & Driver, J. (1995b). Obligatory edge assignment in vision: The role of figure and part segmentation in...
  • Bennett, B.M., & Hoffman, D.D. (1987). Shape decompositions for visual shape recognition: The role of transversality....
  • Beusmans, J., Hoffman, D.D., & Bennett, B.M. (1987). Description of solid shape and its inference from occluding...
  • Biederman, I. (1987). Recognition-by-components: A theory of human image understanding. Psychological Review, 94,...
  • Biederman, I., & Cooper, E.E. (1991). Priming contour-deleted images: Evidence for intermediate representations in...
  • Binford, T.O. (1971, December). Visual perception by computer. IEEE Systems Science and Cybernetics Conference, Miami,...
  • Blake, A. (1984). Reconstructing a visible surface. In Proceedings of AAAI Conference 1984 (pp. 362–365). Los Altos,...
  • Blake, A., & Zisserman, A. (1986). Invariant surface reconstruction using weak continuity constraints. In Proceedings...
  • Blake, A., & Zisserman, A. (1987). Visual reconstruction. Cambridge, MA: MIT...
  • Bower, G.H., & Glass, A.L. (1976). Structural units and the redintegrative power of picture fragments. Journal of...
  • Brady, M., & Asada, H. (1984). Smoothed local symmetries and their implementation. International Journal of Robotics...
  • Braunstein, M.L., Hoffman, D.D., & Saidpour, A. (1989). Parts of visual objects: An experimental test of the minima...
  • Brooks, R.A. (1981). Symbolic reasoning among 3-D models and 2-D images. Artificial Intelligence, 17,...
  • Cave, C.B., & Kosslyn, S.M. (1993). The role of parts and spatial relations in object identification. Perception, 22,...
  • Cornilleau-Peres, V., & Droulez, J. (1989). Visual perception of surface curvature: Psychophysics of curvature...
  • Dickinson, S.J., Pentland, A.P., & Rosenfeld, A. (1992). From volumes to views: An approach to 3D object recognition....
  • Do Carmo, M. (1976). Differential geometry of curves and surfaces. Englewood Cliffs, NJ:...
  • Driver, J., & Baylis, G.C. (1995). One-sided edge assignment in vision. 2. Part decomposition, shape description, and...
  • Enns, J.T., & Rensink, R.A. (1991). Preattentive recovery of three-dimensional orientation from line drawings....
  • Enns, J.T., & Rensink, R.A. (1992). A model for the rapid interpretation of line drawings in early vision. In D. Brogan...
  • Enns, J.T., & Rensink, R.A. (1995). Preemption effects in visual search. Psychological Review, 102,...
  • Grimson, W.E.L. (1981). From images to surfaces: A computational study of the human early visual system. Cambridge, MA:...
  • Guillemin, V., & Pollack, A. (1974). Differential topology. Englewood Cliffs, NJ:...
  • Guzman, A. (1971). Analysis of curved line drawings using context and global information. In Machine intelligence (Vol....
  • Halmos, P.R. (1950). Measure theory. New York:...
  • Hochberg, J. (1964). Perception. Englewood Cliffs, NJ:...
  • Hoffman, D.D. (1983a). Representing shapes for visual recognition. PhD thesis, Massachusetts Institute of Technology,...
  • Hoffman, D.D. (1983b). The interpretation of visual illusions. Scientific American, 249(6),...
  • Hoffman, D.D., & Richards, W.A. (1982). Representing smooth plane curves for recognition: Implications for...
  • Hoffman, D.D., & Richards, W.A. (1984). Parts of recognition. Cognition, 18,...
  • Kanisza, G., & Gerbino, W. (1976). Convexity and symmetry in figure–ground organization. In M. Henle (Ed.), Art and...
  • Kimia, B.B., Tannenbaum, A.R., & Zucker, S.W. (1990). Toward a computational theory of shape: An overview. Lecture...
  • Kimia, B.B., Tannenbaum, A.R., & Zucker, S.W. (1991). Entropy scale-space (pp. 333–344). New York: Plenum...
  • Kimia, B.B., Tannenbaum, A.R., & Zucker, S.W. (1992). The shape triangle: Parts, protrusions, and bends (Tech. Rep. No....
  • Koenderink, J.J. (1984). The structure of images. Biological Cybernetics, 50,...
  • Koenderink, J.J. (1986). Optic flow. Vision Research, 26,...
  • Koffka, K. (1935). Principles of gestalt psychology. New York: Harcourt, Brace and...
  • Kurbat, M.A. (1994). Structural description theories: Is RBC/JIM a general-purpose theory of human entry-level object...
  • Leyton, M. (1987). Symmetry–curvature duality. Computer Vision, Graphics, and Image Processing, 38,...
  • Leyton, M. (1988). A process grammar for shape. Artificial Intelligence, 34,...
  • Leyton, M. (1989). Inferring causal history from shape. Cognitive Science, 13,...
  • Leyton, M. (1992). Symmetry, causality, mind. Cambridge, MA: MIT...
  • Cited by (0)

    View full text