This paper explores the difference between Connectionist proposals for cognitive a r c h i t e c t u r e a n d t h e s o r t s o f m o d e l s t hat have traditionally been assum e d i n c o g n i t i v e s c i e n c e . W e c l a i m t h a t t h (...) e m a j o r d i s t i n c t i o n i s t h a t , w h i l e b o t h Connectionist and Classical architectures postulate representational mental states, the latter but not the former are committed to a symbol-level of representation, or to a ‘language of thought’: i.e., to representational states that have combinatorial syntactic and semantic structure. Several arguments for combinatorial structure in mental representations are then reviewed. These include arguments based on the ‘systematicity’ of mental representation: i.e., on the fact that cognitive capacities always exhibit certain symmetries, so that the ability to entertain a given thought implies the ability to entertain thoughts with semantically related contents. We claim that such arguments make a powerful case that mind/brain architecture is not Connectionist at the cognitive level. We then consider the possibility that Connectionism may provide an account of the neural (or ‘abstract neurological’) structures in which Classical cognitive architecture is implemented. We survey a n u m b e r o f t h e s t a n d a r d a r g u m e n t s t h a t h a v e b e e n o f f e r e d i n f a v o r o f Connectionism, and conclude that they are coherent only on this interpretation. (shrink)
The computational view of mind rests on certain intuitions regarding the fundamental similarity between computation and cognition. We examine some of these intuitions and suggest that they derive from the fact that computers and human organisms are both physical systems whose behavior is correctly described as being governed by rules acting on symbolic representations. Some of the implications of this view are discussed. It is suggested that a fundamental hypothesis of this approach is that there is a natural domain of (...) human functioning that can be addressed exclusively in terms of a formal symbolic or algorithmic vocabulary or level of analysis. Much of the paper elaborates various conditions that need to be met if a literal view of mental activity as computation is to serve as the basis for explanatory theories. The coherence of such a view depends on there being a principled distinction between functions whose explanation requires that we posit internal representations and those that we can appropriately describe as merely instantiating causal physical or biological laws. In this paper the distinction is empirically grounded in a methodological criterion called the " cognitive impenetrability condition." Functions are said to be cognitively impenetrable if they cannot be influenced by such purely cognitive factors as goals, beliefs, inferences, tacit knowledge, and so on. Such a criterion makes it possible to empirically separate the fixed capacities of mind from the particular representations and algorithms used on specific occasions. In order for computational theories to avoid being ad hoc, they must deal effectively with the "degrees of freedom" problem by constraining the extent to which they can be arbitrarily adjusted post hoc to fit some particular set of observations. This in turn requires that the fixed architectural function and the algorithms be independently validated. It is argued that the architectural assumptions implicit in many contemporary models run afoul of the cognitive impenetrability condition, since the required fixed functions are demonstrably sensitive to tacit knowledge and goals. The paper concludes with some tactical suggestions for the development of computational cognitive theories. (shrink)
Although the study of visual perception has made more progress in the past 40 years than any other area of cognitive science, there remain major disagreements as to how closely vision is tied to general cognition. This paper sets out some of the arguments for both sides and defends the position that an important part of visual perception, which may be called early vision or just vision, is prohibited from accessing relevant expectations, knowledge and utilities - in other words it (...) is cognitively impenetrable. That part of vision is complex and articulated and provides a representation of the 3-D surfaces of objects sufficient to serve as an index into memory, with somewhat different outputs being made available to other systems such as those dealing with motor control. The paper also addresses certain conceptual and methodological issues, including the use of signal detection theory and event-related potentials to assess cognitive penetration of vision. A distinction is made among several stages in visual processing. These include, in addition to the inflexible early-vision stage, a pre-perceptual attention allocation stage and a post-perceptual evaluation, memory-accessing, and inference stage which provide several different highly constrained ways in which cognition can affect the outcome of visual perception. The paper discusses arguments that have been presented in both computer vision and psychology showing that vision is "intelligent" and involves elements of problem solving". It is suggested that these cases do not show cognitive penetration, but rather they show that certain natural constraints on interpretation, concerned primarily with optical and geometrical properties of the world, have been compiled into the visual system. The paper also examines a number of examples where instructions and "hints" are alleged to affect. (shrink)
In "Things and Places," Zenon Pylyshyn argues that the process of incrementally constructing perceptual representations, solving the binding problem (determining which properties go together), and, more generally, grounding perceptual ...
In three experiments, subjects attempted to track multiple items as they moved independently and unpredictably about a display. Performance was not impaired when the items were briefly (but completely) occluded at various times during their motion, suggesting that occlusion is taken into account when computing enduring perceptual objecthood. Unimpaired performance required the presence of accretion and deletion cues along fixed contours at the occluding boundaries. Performance was impaired when items were present on the visual field at the same times and (...) to the same degrees as in the occlusion conditions, but disappeared and reappeared in ways which did not implicate the presence of occluding surfaces (e.g. by imploding and exploding into and out of existence, instead of accreting and deleting along a fixed contour). Unimpaired performance did not require visible occluders (i.e. Michotte’s tunnel effect) or globally consistent occluder positions. We discuss implications of these results for theories of objecthood in visual attention. (shrink)
This paper argues that a theory of situated vision, suited for the dual purposes of object recognition and the control of action, will have to provide something more than a system that constructs a conceptual representation from visual stimuli: it will also need to provide a special kind of direct (preconceptual, unmediated) connection between elements of a visual representation and certain elements in the world. Like natural language demonstratives (such as `this' or `that') this direct connection allows entities to be (...) referred to without being categorized or conceptualized. Several reasons are given for why we need such a preconcep- tual mechanism which individuates and keeps track of several individual objects in the world. One is that early vision must pick out and compute the relation among several individual objects while ignoring their properties. Another is that incrementally computing and updating representations of a dynamic scene requires keeping track of token individuals despite changes in their properties or locations. It is then noted that a mechanism meeting these requirements has already been proposed in order to account for a number of disparate empiri- cal phenomena, including subitizing, search-subset selection and multiple object tracking (Pylyshyn et al., Canadian Journal of Experimental Psychology 48(2) (1994) 260). This mechanism, called a visual index or FINST, is brie. (shrink)
It is generally accepted that there is something special about reasoning by using mental images. The question of how it is special, however, has never been satisfactorily spelled out, despite more than thirty years of research in the post-behaviorist tradition. This article considers some of the general motivation for the assumption that entertaining mental images involves inspecting a picture-like object. It sets out a distinction between phenomena attributable to the nature of mind to what is called the cognitive architecture, and (...) ones that are attributable to tacit knowledge used to simulate what would happen in a visual situation. With this distinction in mind, the paper then considers in detail the widely held assumption that in some important sense images are spatially displayed or are depictive, and that examining images uses the same mechanisms that are deployed in visual perception. I argue that the assumption of the spatial or depictive nature of images is only explanatory if taken literally, as a claim about how images are physically instantiated in the brain, and that the literal view fails for a number of empirical reasons – for example, because of the cognitive penetrability of the phenomena cited in its favor. Similarly, while it is arguably the case that imagery and vision involve some of the same mechanisms, this tells us very little about the nature of mental imagery and does not support claims about the pictorial nature of mental images. Finally, I consider whether recent neuroscience evidence clarifies the debate over the nature of mental images. I claim that when such questions as whether images are depictive or spatial are formulated more clearly, the evidence does not provide support for the picture-theory over a symbol-structure theory of mental imagery. Even if all the empirical claims were true, they do not warrant the conclusion that many people have drawn from them: that mental images are depictive or are displayed in some (possibly cortical) space. Such a conclusion is incompatible with what is known about how images function in thought. We are then left with the provisional counterintuitive conclusion that the available evidence does not support rejection of what I call the “null hypothesis”; namely, that reasoning with mental images involves the same form of representation and the same processes as that of reasoning in general, except that the content or subject matter of thoughts experienced as images includes information about how things would look. (shrink)
Marr (1982) may have been one of the rst vision researchers to insist that in modeling vision it is important to separate the location of visual features from their type. He argued that in early stages of visual processing there must be “place tokens” that enable subsequent stages of the visual system to treat locations independent of what specic feature type was at that location. Thus, in certain respects a collinear array of diverse features could still be perceived as a (...) line, and under certain conditions could function as such in perceptual phenomena like the Poggendorf illusion. (shrink)
6. Seeing With the Mind’s Eye 1: The Puzzle of Mental Imagery .................................................6-1 6.1 What is the puzzle about mental imagery?..............................................................................6-1 6.2 Content, form and substance of representations ......................................................................6-6 6.3 What is responsible for the pattern of results obtained in imagery studies?.................................6-8..
In the past decade there has been renewed interest in the study of mental imagery. Emboldened by new findings from neuroscience, many people have revived the idea that mental imagery involves a special format of thought, one that is pictorial in nature. But the evidence and the arguments that exposed deep conceptual and empirical problems in the picture theory over the past 300 years have not gone away. I argue that the new evidence from neural imaging and clinical neuropsychology does (...) little to justify this recidivism because it does not address the format of mental images. I also discuss some reasons why the picture theory is so resistant to counterarguments and suggest ways in which non-pictorial theories might account for the apparent spatial nature of images. (shrink)
It is argued that the traditional distinction between artificial intelligence and cognitive simulation amounts to little more than a difference in style of research - a different ordering in goal priorities and different methodological allegiances. Both enterprises are constrained by empirical considerations and both are directed at understanding classes of tasks that are defined by essentially psychological criteria. Because of the different ordering of priorities, however, they occasionally take somewhat different stands on such issues as the power/generality trade-off and on (...) the relevance of the sort of data collected in experimental psychology laboratories. (shrink)
The task of tracking a small number (about four or five) visual targets within a larger set of identical items, each of which moves randomly and independently, has been used extensively to study object-based attention. Analysis of this multiple object tracking (MOT) task shows that it logically entails solving the correspondence problem for each target over time, and thus that the individuality of each of the targets must be tracked. This suggests that when successfully tracking objects, observers must also keep (...) track of them as unique individuals. Yet in the present studies we show that observers are poor at recalling the identity of successfully tracked objects (as specified by a unique identifier associated with each target, such as a number or starting location). Studies also show that the identity of targets tends to be lost when they come close together and that this tendency is greater between pairs of targets than between targets and nontargets. The significance of this finding in relation to the multiple object tracking paradigm is discussed. (shrink)
I recently discovered that work I was doing in the laboratory and in theoretical writings was implicitly taking a position on a set of questions that philosophers had been worrying about for much of the past 30 or more years. My clandestine involvement in philosophical issues began when a computer science colleague and I were trying to build a model of geometrical reasoning that would draw a diagram and notice things in the diagram as it drew it (Pylyshyn, Elcock, Marmor, (...) & Sander, 1978). One problem we found we had to face was that if the system discovered a right angle it had no way to tell whether this was the intersection of certain lines it had drawn earlier while constructing a certain figure, and if so which particular lines they were. Moreover, the model had no way of telling whether this particular right angle was identical to some bit of drawing it had earlier encountered and represented as, say, the base of a particular triangle. There was, in other words, no way to determined the identity of an element (I use the term “element” when referring to a graphical unit such as used in experiments. Otherwise when speaking informally I use the term “thing” on the grounds that nobody would mistake that term for a technical theoretical construct. Eventually I end up calling them “Visual Objects” to conform to usage in psychology) at two different times if it was represented differently at those times. This led to some speculation about the need for what we called a “finger” that could be placed at a particular element of interest and that could be used to identify it as particular token thing (the way you might identify a particular feature on paper by labeling it). In general we needed something like a finger that would stay attached to a particular element and could be used to maintain a correspondence between the individual element that was just noticed now and one that had been represented in some fashion at an earlier time. The idea of such fingers (which.... (shrink)
inﬂuence. One of the principal characteristics that distinguishes Cognitive Science from more traditional studies of cognition within Psychology, is the extent to which it has been inﬂuenced by both the ideas and the techniques of computing. It may come as a surprise to the outsider, then, to discover that there is no unanimity within the discipline on either (a) the nature (and in some cases the desireabilty) of the inﬂuence and (b) what computing is –- or at least on its.
We present three studies examining whether multiple-object tracking (MOT) benefits from the active inhibition of nontargets, as proposed in (Pylyshyn, 2004). Using a probedot technique, the first study showed poorer probe detection on nontargets than on either the targets being tracked or in the empty space between objects. The second study used a matching nontracking task to control for possible masking of probes, independent of target tracking. The third study examined how localized the inhibition is to individual nontargets. The result (...) of these three studies led to the conclusion that nontargets are subject to a highly localized object-based inhibition. Implications of this finding for the FINST visual index theory are discussed. We suggest that we need to distinguish between the differentiation (or individuation) of enduring token objects and the process of making the objects accessible through indexes, with only the latter being limited to 4 or 5 objects. (shrink)
Using a novel enumeration task, we examined the encoding of spatial information during subitizing. Observers were shown masked presentations of randomly-placed discs on a screen and were required to mark the perceived locations of these discs on a subsequent blank screen. This provided a measure of recall for object locations and an indirect measure of display numerosity. Observers were tested on three stimulus durations and eight numerosities. Enumeration performance was high for displays containing up to six discs—a higher subitizing range (...) than reported in previous studies. Error in the location data was measured as the distance between corresponding stimulus and response discs. Overall, location errors increased in magnitude with larger numerosities and shorter display durations. When errors were computed as disc distance from display centroid, results suggest a compressed representation by observers. Additionally, enumeration and localization accuracy increased with display regularity. (shrink)
The target article claimed that although visual apprehension involves all of general cognition, a significant component of vision (referred to as early vision) works independently of cognition and yet is able to provide a surprisingly high level interpretation of visual inputs, roughly up to identifying general shape-classes. The commentators were largely sympathetic, but frequently disagreed on how to draw the boundary, on exactly what early vision delivers, on the role that attention plays, and on how to interpret the neurophysiological data (...) showing top-down effects. A significant number simply asserted that they were not willing to accept any distinction between vision and cognition, and a surprising number even felt that we could never tell for sure, so why bother? Among the topics covered was the relation of cognition and consciousness, the relation of early vision to other modules such as face recognition and language, and the role of natural constraints. (shrink)
I’m one of those who is awed and impressed by the potential of this ﬁeld and have devoted some part of my energy to persuading people that it is a positive force. I have done so largely on the grounds of its economic beneﬁts and it potential for making the fruits of computer technology more generally available to the public — for example, to help the overworked physician; to search for oil and minerals and help manage our valuable resources; to (...) explore, mine, and experimentindangerousenvironments;toallowthenon-computingpublicaccesstovast libraries of important information and even advice, and in the process give real meaning to the.. (shrink)
called,_ Cognitive Science_ was to bring back scienti?c realism. This may strike you as a very odd claim, for one does not usually think of science as needing to be talked into scienti?c realism. Science is, after all, the study of reality by the most precise instruments of measurement and.
The target article proposes that visual experience arises when sensorimotor contingencies are exploited in perception. This novel analysis of visual experience fares no better than the other proposals that the article rightly dismisses, and for the same reasons. Extracting invariants may be needed for recognition, but it is neither necessary nor sufficient for having a visual experience. While the idea that vision involves the active extraction of sensorimotor invariants has merit, it does not replace the need for perceptual representations. Vision (...) is not just for the immediate controlling of action; it is also for finding out about the world, from which inferences may be drawn and beliefs changed. (shrink)
You might reasonably surmise from the title of this paper that I will be discussing a theory of vision. After all, what is a theory of vision but a theory of how the world is connected to our visual representations? Theories of visual perception universally attempt to give an account of how a proximal stimulus (presumably a pattern impinging on the retina) can lead to a rich representation of a three dimensional world and thence to either the recognition of known (...) objects or to the coordination of actions with visual information. Such theories typically provide an effective (i.e., computable) mapping from a 2D pattern to a representation of a 3D scene, usually in the form of a symbol structure. But such a mapping, though undoubtedly the essential purpose of a theory of vision, leaves at least one serious problem that I intend to discuss here. It is this problem, rather than a theory of vision itself, that is the subject of this talk. (shrink)
We previously reported that in the Multiple Object Tracking (MOT) task, which requires tracking several identical targets moving unpredictably among identical nontargets, the nontargets appear to be inhibited, as measured by a probe-dot detection method. The inhibition appears to be local to nontargets and does not extend to the space between objects – dropping off very rapidly away from targets and nontargets. In the present three experiments we show that (1) nontargets that are identical to targets but remain in a (...) fixed location are not inhibited and (2) moving objects that have a different shape from targets are inhibited as much as same-shape nontargets, and (3) nontargets that are on a different depth plane and so are easily filtered out are not inhibited. This is consistent with a taskdependent view of item inhibition wherein nontargets are inhibited if (and only if) they are likely to be mistaken for targets. (shrink)
In Multiple Object Tracking (MOT), an observer is able to track 4 – 5 objects in a group of otherwise indistinguishable objects that move independently and unpredictably about a display. According to the Visual Indexing Theory (Pylyshyn, 1989), successful tracking requires that target objects be indexed while they are distinct -- before tracking begins. In the typical MOT task, the target objects are briefly flashed resulting in the automatic assignment of indexes. The question arises whether indexes are only assigned automatically (...) or whether they can be assigned voluntarily in a top-down manner. This study compares several ways of specifying which of 8 items are the targets to be tracked. In the Flash condition the target items were flashed, in the Nonflash condition the targets were the items not flashed, and in the Number condition the targets were specified by number (e.g., items numbered 1-4). The results showed no difference between the three conditions, suggesting that tracking was possible with either voluntary or involuntary indexing. The second experiment tested the hypothesis that voluntary indexing is possible only if the target items are visited serially. The conditions were the same as experiment 1 except that the time available for index assignment was too short to allow targets to be visited serially. In this experiment, targets flashed only once (or, in the Numbers condition, remained visible for about 400 ms). The results showed a decrease in tracking performance for the Number condition, but the Flash and the Nonflash conditions did not differ, suggesting that as long as the designation of targets was done rapidly, the observer did not have to visit each target serially in order to index it. These results suggest that indexing can occur both automatically and voluntarily, and without serially visiting them, so long as the items are successfully specified. (shrink)
This article defends the claim that a significant part of visual perception (called “early vision”) is impervious to the influence of beliefs, expectations or knowledge. We examine a wide range of empirical evidence that has been cited in support of the continuity of vision and cognition and argue that the evidence either shows within- vision top-down effects, or else the extra-visual effects that are demonstrated occur before the operation of the autonomous early vision system (through the allocation of focal attention) (...) or after the visual system has produced its 3D shape-description (through the intervention of post-visual decision processes). (shrink)
1. Background: Representation in language and vision ................................................ 1 2. Some parallels between the study of vision and language......................................... 3..