Intelligent systems are faced with the problem of securing a principled (ideally, veridical) relationship between the world and its internal representation. I propose a unified approach to visual representation, addressing both the needs of superordinate and basic-level categorization and of identification of specific instances of familiar categories. According to the proposed theory, a shape is represented by its similarity to a number of reference shapes, measured in a high-dimensional space of elementary features. This amounts to embedding the stimulus in a (...) low-dimensional proximal shape space. That space turns out to support representation of distal shape similarities which is veridical in the sense of Shepard's (1968) notion of second-order isomorphism (i.e., correspondence between distal and proximal similarities among shapes, rather than between distal shapes and their proximal representations). Furthermore, a general expression for similarity between two stimuli, based on comparisons to reference shapes, can be used to derive models of perceived similarity ranging from continuous, symmetric, and hierarchical, as in the multidimensional scaling models (Shepard, 1980), to discrete and non-hierarchical, as in the general contrast models (Tversky, 1977; Shepard and Arabie, 1979). (shrink)
It is proposed to conceive of representation as an emergent phenomenon that is supervenient on patterns of activity of coarsely tuned and highly redundant feature detectors. The computational underpinnings of the outlined concept of representation are (1) the properties of collections of overlapping graded receptive fields, as in the biological perceptual systems that exhibit hyperacuity-level performance, and (2) the sufficiency of a set of proximal distances between stimulus representations for the recovery of the corresponding distal contrasts between stimuli, as in (...) multidimensional scaling. The present preliminary study appears to indicate that this concept of representation is computationally viable, and is compatible with psychological and neurobiological data. (shrink)
A computational theory of consciousness should include a quantitative measure of consciousness, or MoC, that (i) would reveal to what extent a given system is conscious, (ii) would make it possible to compare not only different systems, but also the same system at different times, and (iii) would be graded, because so is consciousness. However, unless its design is properly constrained, such an MoC gives rise to what we call the boundary problem: an MoC that labels a system as conscious (...) will do so for some – perhaps most – of its subsystems, as well as for irrelevantly extended systems (e.g., the original system augmented with physical appendages that contribute nothing to the properties supposedly supporting consciousness), and for aggregates of individually conscious systems (e.g., groups of people). This problem suggests that the properties that are being measured are epiphenomenal to consciousness, or else it implies a bizarre proliferation of minds. We propose that a solution to the boundary problem can be found by identifying properties that are intrinsic or systemic: properties that clearly differentiate between systems whose existence is a matter of fact, as opposed to those whose existence is a matter of interpretation (in the eye of the beholder). We argue that if a putative MoC can be shown to be systemic, this ipso facto resolves any associated boundary issues. As test cases, we analyze two recent theories of consciousness in light of our definitions: the Integrated Information Theory and the Geometric Theory of consciousness. (shrink)
Are minds really dynamical or are they really symbolic? Because minds are bundles of computations, and because computation is always a matter of interpretation of one system by another, minds are necessarily symbolic. Because minds, along with everything else in the universe, are physical, and insofar as the laws of physics are dynamical, minds are necessarily dynamical systems. Thus, the short answer to the opening question is “yes.” It makes sense to ask further whether some of the computations that constitute (...) a human mind are constrained by functional, algorithmic, or implementational factors to be essentially of the discrete symbolic variety (even if they supervene on an apparently continuous dynamical substrate). I suggest that here too the answer is “yes” and discuss the need for such discrete, symbolic cognitive computations in communication-related tasks. (shrink)
We introduce a set of biologically and computationally motivated design choices for modeling the learning of language, or of other types of sequential, hierarchically structured experience and behavior, and describe an implemented system that conforms to these choices and is capable of unsupervised learning from raw natural-language corpora. Given a stream of linguistic input, our model incrementally learns a grammar that captures its statistical patterns, which can then be used to parse or generate new data. The grammar constructed in this (...) manner takes the form of a directed weighted graph, whose nodes are recursively (hierarchically) defined patterns over the elements of the input stream. We evaluated the model in seventeen experiments, grouped into five studies, which examined, respectively, (a) the generative ability of grammar learned from a corpus of natural language, (b) the characteristics of the learned representation, (c) sequence segmentation and chunking, (d) artificial grammar learning, and (e) certain types of structure dependence. The model's performance largely vindicates our design choices, suggesting that progress in modeling language acquisition can be made on a broad front—ranging from issues of generativity to the replication of human experimental findings—by bringing biological and computational considerations, as well as lessons from prior efforts, to bear on the modeling approach. (shrink)
A standing challenge for the science of mind is to account for the datum that every mind faces in the most immediate – that is, unmediated – fashion: its phenomenal experience. The complementary tasks of explaining what it means for a system to give rise to experience and what constitutes the content of experience (qualia) in computational terms are particularly challenging, given the multiple realizability of computation. In this paper, we identify a set of conditions that a computational theory must (...) satisfy for it to constitute not just a sufficient but a necessary, and therefore naturalistic and intrinsic, explanation of qualia. We show that a common assumption behind many neurocomputational theories of the mind, according to which mind states can be formalized solely in terms of instantaneous vectors of activities of representational units such as neurons, does not meet the requisite conditions, in part because it relies on inactive units to shape presently experienced qualia and implies a homogeneous representation space, which is devoid of intrinsic structure. We then sketch a naturalistic computational theory of qualia, which posits that experience is realized by dynamical activity-space trajectories (rather than points) and that its richness is measured by the representational capacity of the trajectory space in which it unfolds. (shrink)
Scientific theories of consciousness identify its contents with the spatiotemporal structure of neural population activity. We follow up on this approach by stating and motivating Dynamical Emergence Theory, which defines the amount and structure of experience in terms of the intrinsic topology and geometry of a physical system’s collective dynamics. Specifically, we posit that distinct perceptual states correspond to coarse-grained macrostates reflecting an optimal partitioning of the system’s state space—a notion that aligns with several ideas and results from computational neuroscience (...) and cognitive psychology. We relate DET to existing work, offer predictions for empirical studies, and outline future research directions. (shrink)
The problem of representing the spatial structure of images, which arises in visual object processing, is commonly described using terminology borrowed from propositional theories of cognition, notably, the concept of compositionality. The classical propositional stance mandates representations composed of symbols, which stand for atomic or composite entities and enter into arbitrarily nested relationships. We argue that the main desiderata of a representational system — productivity and systematicity — can (indeed, for a number of reasons, should) be achieved without recourse to (...) the classical, proposition-like compositionality. We show how this can be done, by describing a systematic and productive model of the representation of visual structure, which relies on static rather than dynamic binding and uses coarsely coded rather than atomic shape primitives. (shrink)
We compare our model of unsupervised learning of linguistic structures, ADIOS [1, 2, 3], to some recent work in computational linguistics and in grammar theory. Our approach resembles the Construction Grammar in its general philosophy (e.g., in its reliance on structural generalizations rather than on syntax projected by the lexicon, as in the current generative theories), and the Tree Adjoining Grammar in its computational characteristics (e.g., in its apparent afﬁnity with Mildly Context Sensitive Languages). The representations learned by our algorithm (...) are truly emergent from the (unannotated) corpus data, whereas those found in published works on cognitive and construction grammars and on TAGs are hand-tailored. Thus, our results complement and extend both the computational and the more linguistically oriented research into language acquisition. We conclude by suggesting how empirical and formal study of language can be best integrated. (shrink)
An image of a face depends not only on its shape, but also on the viewpoint, illumination conditions, and facial expression. A face recognition system must overcome the changes in face appearance induced by these factors. This paper investigate two related questions: the capacity of the human visual system to generalize the recognition of faces to novel images, and the level at which this generalization occurs. We approach this problems by comparing the identi cation and generalization capacity for upright and (...) inverted faces. For upright faces, we found remarkably good generalization to novel conditions. For inverted faces, the generalization to novel views was signi cantly worse for both new illumination and viewpoint, although the performance on the training images was similar to the upright condition. Our results indicate that at least some of the processes that support generalization across viewpoint and illumination are neither universal (because subjects did not generalize as easily for inverted faces as for upright ones), nor strictly objectspeci c (because in upright faces nearly perfect generalization was possible from a single view, by itself insu cient for building a complete object-speci c model). We propose that generalization in face recognition occurs at an intermediate level that is applicable to a class of objects, and that at this level upright and inverted faces initially constitute distinct object classes. (shrink)
The distributional principle according to which morphemes that occur in identical contexts belong, in some sense, to the same category  has been advanced as a means for extracting syntactic structures from corpus data. We extend this principle by applying it recursively, and by using mutual information for estimating category coherence. The resulting model learns, in an unsupervised fashion, highly structured, distributed representations of syntactic knowledge from corpora. It also exhibits promising behavior in tasks usually thought to require representations anchored (...) in a grammar, such as systematicity. (shrink)
Lasnik’s review of the Minimalist program in syntax  offers cognitive scientists help in navigating some of the arcana of the current theoretical thinking in transformational generative grammar. One may observe, however, that this journey is more like a taxi ride gone bad than a free tour: it is the driver who decides on the itinerary, and questioning his choice may get you kicked out. Meanwhile, the meter in the cab of the generative theory of grammar is running, and has (...) been since the publication of Chomsky’s Syntactic Structures in 1957. The fare that it ran up is none the less daunting for the detours made in his Aspects of Theory of Syntax in 1965, Government and Binding in 1981, and now The Minimalist Program, in 1995. Paraphrasing Winston Churchill, it seems like never in the ﬁeld of cognitive science was so much owed by so many of us to so few. For most of us in the cognitive sciences this situation will appear quite benign, if we realize that it is the generative linguists who should by rights be paying this bill. The reason for that is simple and is well known in the philosophy of science: putting forward a theory is like taking out a loan, to be repayed by gleaning an empirical basis for it; theories that fail to do so are declared bankrupt. In the sciences of the mind, this maxim translates into the need to demonstrate the psychological, and, eventually, the neurobiological, reality of the theoretical constructs. Many examples of this process can be found in the study of human vision, where, as in language, direct observation of the underlying mechanisms is difﬁcult; for instance, the concept of multiple parallel spatial frequency channels, introduced in the late 1960s, was completely vindicated by purely behavioral means over the following decade; see, e.g., . In linguistics, the nature of the requisite evidence is well described by Townsend and Bever: “What do we test today if we want to explore the behavioral implications of syntax?. (shrink)
The proponents of machine consciousness predicate the mental life of a machine, if any, exclusively on its formal, organizational structure, rather than on its physical composition. Given that matter is organized on a range of levels in time and space, this generic stance must be further constrained by a principled choice of levels on which the posited structure is supposed to reside. Indeed, not only must the formal structure fit well the physical system that realizes it, but it must do (...) so in a manner that is determined by the system itself, simply because the mental life of a machine cannot be up to an external observer. To illustrate just how tall this order is, we carefully analyze the scenario in which a digital computer simulates a network of neurons. We show that the formal correspondence between the two systems thereby established is at best partial, and, furthermore, that it is fundamentally incapable of realizing both some of the essential properties of actual neuronal systems and some of the fundamental properties of experience. Our analysis suggests that, if machine consciousness is at all possible, conscious experience can only be instantiated in a class of machines that are entirely different from digital computers, namely, time-continuous, open, analog dynamical systems. (shrink)
To learn a visual code in an unsupervised manner, one may attempt to capture those features of the stimulus set that would contribute signiﬁcantly to a statistically eﬃcient representation. Paradoxically, all the candidate features in this approach need to be known before statistics over them can be computed. This paradox may be circumvented by conﬁning the repertoire of candidate features to actual scene fragments, which resemble the “what+where” receptive ﬁelds found in the ventral visual stream in primates. We describe a (...) single-layer network that learns such fragments from unsegmented raw images of structured objects. The learning method combines fast imprinting in the feedforward stream with lateral interactions to achieve single-epoch unsupervised acquisition of spatially localized features that can support systematic treatment of structured objects . (shrink)
We describe a linguistic pattern acquisition algorithm that learns, in an unsupervised fashion, a streamlined representation of corpus data. This is achieved by compactly coding recursively structured constituent patterns, and by placing strings that have an identical backbone and similar context structure into the same equivalence class. The resulting representations constitute an efﬁcient encoding of linguistic knowledge and support systematic generalization to unseen sentences.
We report a quantitative analysis of the cross-utterance coordination observed in child-directed language, where successive utterances often overlap in a manner that makes their constituent structure more prominent, and describe the application of a recently published unsupervised algorithm for grammar induction to the largest available corpus of such language, producing a grammar capable of accepting and generating novel wellformed sentences. We also introduce a new corpus-based method for assessing the precision and recall of an automatically acquired generative grammar without recourse (...) to human judgment. The present work sets the stage for the eventual development of more powerful unsupervised algorithms for language acquisition, which would make use of the coordination structures present in natural child-directed speech. (shrink)
We describe a pattern acquisition algorithm that learns, in an unsupervised fashion, a streamlined representation of linguistic structures from a plain natural-language corpus. This paper addresses the issues of learning structured knowledge from a large-scale natural language data set, and of generalization to unseen text. The implemented algorithm represents sentences as paths on a graph whose vertices are words. Signiﬁcant patterns, determined by recursive context-sensitive statistical inference, form new vertices. Linguistic constructions are represented by trees composed of signiﬁcant patterns and (...) their associated equivalence classes. An input module allows the algorithm to be subjected to a standard test of English as a Second Language proﬁ- ciency. The results are encouraging: the model attains a level of performance considered to be “intermediate” for 9th-grade students, despite having been trained on a corpus containing transcribed speech of parents directed to small children. (shrink)
We describe a uniﬁed framework for the understanding of structure representation in primate vision. A model derived from this framework is shown to be effectively systematic in that it has the ability to interpret and associate together objects that are related through a rearrangement of common “middle-scale” parts, represented as image fragments. The model addresses the same concerns as previous work on compositional representation through the use of what+where receptive ﬁelds and attentional gain modulation. It does not require prior exposure (...) to the individual parts, and avoids the need for abstract symbolic binding. (shrink)
Language is a rewarding ﬁeld if you are in the prediction business. A reader who is ﬂuent in English and who knows how academic papers are typically structured will readily come up with several possible guesses as to where the title of this section could have gone, had it not been cut short by the ellipsis. Indeed, in the more natural setting of spoken language, anticipatory processing is a must: performance of machine systems for speech interpretation depends critically on the (...) availability of a good predictive model of how utterances unfold in time (Baker, 1975; Jelinek, 1990; Goodman, 2001), and there is strong evidence that prospective uncertainty affects human sentence processing too (Jurafsky, 2003; Hale, 2006; Levy, 2008). The human ability to predict where the current utterance is likely to be going is just another adaptation to the general pressure to anticipate the future (Hume, 1748; Dewey, 1910; Craik, 1943), be it in perception, thinking, or action, which is exerted on all cognitive systems by evolution (Dennett, 2003). Look-ahead in language is, however, special in one key respect: language is a medium for communication, and in communication the most interesting (that is, informative) parts of the utterance that the speaker is working through are those that cannot be predicted by the listener ahead of time. (shrink)
A view is put forward, according to which various aspects of the structure of the world as internalized by the brain take the form of “neural spaces,” a concrete counterpart for Shepard's “abstract” ones. Neural spaces may help us understand better both the representational substrate of cognition and the processes that operate on it. [Shepard].
Although computational considerations suggest that a resource-limited memory system may have to trade oﬀ capacity for generalization ability, such a trade-oﬀ has not been demonstrated in the past. We describe a simple model of memory that exhibits this trade-oﬀ and describe its performance in a variety of tasks.
The publication in 1982 of David Marr’s Vision has delivered a singular boost and a course correction to the science of vision. Thirty years later, cognitive science is being transformed by the new ways of thinking about what it is that the brain computes, how it does that, and, most importantly, why cognition requires these computations and not others. This ongoing process still owes much of its impetus and direction to the sound methodology, engaging style, and unique voice of Marr’s (...) Vision. (shrink)
The statistical structure of a class of objects such as human faces can be exploited to recognize familiar faces from novel viewpoints and under variable illumination conditions. We present computational and psychophysical data concerning the extent to which class-based learning transfers or generalizes within the class of faces. We rst examine the computational prerequisite for generalization across views of novel faces, namely, the similarity of di erent faces to each other. We next describe two computational models which exploit the similarity (...) structure of the class of faces. The performance of these models constrains hypotheses about the nature of face representation in human vision, and supports the notion that human face processing operates in a class-based fashion. Finally, we relate the computational data to well-established ndings in the human memory literature concerning the relationship between the typicality and recognizability of faces. (shrink)
Visual objects can be represented by their similarities to a small number of reference shapes or prototypes. This method yields low-dimensional (and therefore computationally tractable) representations, which support both the recognition of familiar shapes and the categorization of novel ones. In this note, we show how such representations can be used in a variety of tasks involving novel objects: viewpoint-invariant recognition, recovery of a canonical view, estimation of pose, and prediction of an arbitrary view. The unifying principle in all these (...) cases is the representation of the view space of the novel object as an interpolation of the view spaces of the reference shapes. (shrink)
Computer vision systems are, on most counts, poor performers, when compared to their biological counterparts. The reason for this may be that computer vision is handicapped by an unreasonable assumption regarding what it means to see, which became prevalent as the notions of intrinsic images and of representation by reconstruction took over the ﬁeld in the late 1970’s. Learning from biological vision may help us to overcome this handicap.
Converging findings from English, Mandarin, and other languages suggest that observed may be algorithmic. First, computational principles behind recently developed algorithms that acquire productive constructions from raw texts or transcribed child-directed speech impose family resemblance on learnable languages. Second, child-directed speech is particularly rich in statistical (and social) cues that facilitate learning of certain types of structures.
Construction-based approaches to syntax (Croft, 2001; Goldberg, 2003) posit a lexicon populated by units of various sizes, as envisaged by (Langacker, 1987). Constructions may be speciﬁed completely, as in the case of simple morphemes or idioms such as take it to the bank, or partially, as in the expression what’s X doing Y?, where X and Y are slots that admit ﬁllers of particular types (Kay and Fillmore, 1999). Constructions offer an intriguing alternative to traditional rule-based syntax by hinting at (...) the extent to which the complexity of language can stem from a rich repertoire of stored, more or less entrenched (Harris, 1998) representations that address both syntactic and semantic issues, and encompass, in addition to general rules, “totally idiosyncratic forms and patterns of all intermediate degrees of generality” (Langacker, 1987, p.46). Because constructions are by their very nature language-speciﬁc, the question of acquisition in Construction Grammar is especially poignant. We address this issue by offering an unsupervised algorithm that learns constructions from raw corpora. (shrink)
Nearest-neighbor correlation-based similarity computation in the space of outputs of complex-type receptive elds can support robust recognition of 3D objects. Our experiments with four collections of objects resulted in mean recognition rates between 84% and 94%, over a 40 40 range of viewpoints, centered on a stored canonical view and related to it by rotations in depth. This result has interesting implications for the design of a front end to an arti cial object recognition system, and for the understanding of (...) the faculty of object recognition in primate vision. (shrink)
What insights does comparative biology provide for furthering scienti¿ c understanding of the evolution of dynamic coordination? Our discussions covered three major themes: (a) the fundamental unity in functional aspects of neurons, neural circuits, and neural computations across the animal kingdom; (b) brain organization –behavior relationships across animal taxa; and (c) the need for broadly comparative studies of the relationship of neural structures, neural functions, and behavioral coordination. Below we present an overview of neural machinery and computations that are shared (...) by all nervous systems across the animal kingdom, and the related fact that there really are no “simple” relationships in coordination between nervous systems and the behavior they produce. The simplest relationships seen in living organisms are already fairly complex by computational standards. These realizations led us to think about ways that brain similarities and differences could be used to produce new insights into complex brain–behavior phenomena (including a critical appraisal of the roles of cortical and noncortical structures in mammalian behavior), and to think brieÀy about how future studies could best exploit comparative methods to elucidate better general principles underlying the neural mechanisms associated with behavioral coordination. In our view, it is unlikely that the intricacies interrelating neural and behavioral coordination are due to one particular manifestation (such as neural oscillation or the possession of a six-layered cortex). Instead of considering the human cortex to be the standard against which all things are measured (and thus something to crow about), both broad and focused comparative studies on behavioral similarities and differences will be necessary to elucidate the fundamental principles underlying dynamic coordination. (shrink)
Shanahan’s eloquently argued version of the global workspace theory ﬁts well into the emerging understanding of consciousness as a computational phenomenon. His disinclination toward metaphysics notwithstanding, Shanahan’s book can also be seen as supportive of a particular metaphysical stance on consciousness — the computational identity theory.
We describe a method for automatic word sense disambiguation using a text corpus and a machine- readable dictionary (MRD). The method is based on word similarity and context similarity measures. Words are considered similar if they appear in similar contexts; contexts are similar if they contain similar words. The circularity of this deﬁnition is resolved by an iterative, converging process, in which the system learns from the corpus a set of typical usages for each of the senses of the polysemous (...) word listed in the MRD. A new instance of a polysemous word is assigned the sense associated with the typical usage most similar to its context. Experiments show that this method performs well, and can learn even from very sparse training data. (shrink)
differentiaily rated pairwise similarity when confronted with two pairs of objects, each revolving in a separate window on a computer screen. Subject data were pooled using individually weighted MDS (ref. 11; in all the experiments, the solutions were consistent among subjects). In each trial, the subject had to select among two pairs of shapes the one consisting of the most similar shapes. The subjects were allowed to respond at will; most responded within 10 sec. Proximity (that is, perceived similarity) tables (...) derived from the judgments were processed to verify their degree of transitivity (4% of all triplets were found intransitive) and then submitted to MDS. In the long-term memory (LTM) variant of this experiment, the subjects were first trained to associate a label (a three-letter nonsensical string, such as "BON" or "POM") with each object and then carried out the pairs of pairs comparison task from memory, prompted by the object labels rather than by the objects themselves. Six subjects participated in each of the two LTM experiments (Star and Triangle). The subjects were taught each shape in a separate session and had to discriminate between that shape and six similar nontargets from various viewpoints. Training continued until the recognition rate reached 90%, over a period of several days. The subjects were never exposed to more than one target in one session and were not told the ultimate purpose of the experiment. After 2 to 3 days of rest, they were tested with questions such as: "is the BON more similar to POM than TOC to ROX?", for all pairs of pairs of stimuli. In the LTM experiments, 8% of the.. (shrink)
Idealized mo dels of receptive elds (RFs) can be used as building blocks for the creation of p owerful distributed computation systems. The present rep ort concentrates on inv estigating the utility of collections of RFs in representing 3D objects under changing viewing conditions. The main requirement in this task is that the pattern of activity of RFs vary as little as p ossible when the object and the camera move relative to each other. I propose a method for representing (...) objects by RF activities, based on the observation that, in the case of rotation around a xed axis, dierences of activities of RFs that are prop erly situated with resp ect to that axis remain inv ariant. Results of computational exp eriments suggest that a representation scheme based on this algorithm for the choice of stable pairs of RFs would p erform consistently b etter than a scheme inv olving random sets of RFs. The proposed scheme may be useful under object or camera rotation, b oth for ideal Lam b ertian objects, and for real-world objects such as human faces. (shrink)