It is proposed to conceive of representation as an emergent phenomenon that is supervenient on patterns of activity of coarsely tuned and highly redundant feature detectors. The computational underpinnings of the outlined concept of representation are (1) the properties of collections of overlapping graded receptive fields, as in the biological perceptual systems that exhibit hyperacuity-level performance, and (2) the sufficiency of a set of proximal distances between stimulus representations for the recovery of the corresponding distal contrasts between stimuli, as in (...) multidimensional scaling. The present preliminary study appears to indicate that this concept of representation is computationally viable, and is compatible with psychological and neurobiological data. (shrink)
We introduce a set of biologically and computationally motivated design choices for modeling the learning of language, or of other types of sequential, hierarchically structured experience and behavior, and describe an implemented system that conforms to these choices and is capable of unsupervised learning from raw natural-language corpora. Given a stream of linguistic input, our model incrementally learns a grammar that captures its statistical patterns, which can then be used to parse or generate new data. The grammar constructed in this (...) manner takes the form of a directed weighted graph, whose nodes are recursively (hierarchically) defined patterns over the elements of the input stream. We evaluated the model in seventeen experiments, grouped into five studies, which examined, respectively, (a) the generative ability of grammar learned from a corpus of natural language, (b) the characteristics of the learned representation, (c) sequence segmentation and chunking, (d) artificial grammar learning, and (e) certain types of structure dependence. The model's performance largely vindicates our design choices, suggesting that progress in modeling language acquisition can be made on a broad front—ranging from issues of generativity to the replication of human experimental findings—by bringing biological and computational considerations, as well as lessons from prior efforts, to bear on the modeling approach. (shrink)
Intelligent systems are faced with the problem of securing a principled (ideally, veridical) relationship between the world and its internal representation. I propose a unified approach to visual representation, addressing both the needs of superordinate and basic-level categorization and of identification of specific instances of familiar categories. According to the proposed theory, a shape is represented by its similarity to a number of reference shapes, measured in a high-dimensional space of elementary features. This amounts to embedding the stimulus in a (...) low-dimensional proximal shape space. That space turns out to support representation of distal shape similarities which is veridical in the sense of Shepard's (1968) notion of second-order isomorphism (i.e., correspondence between distal and proximal similarities among shapes, rather than between distal shapes and their proximal representations). Furthermore, a general expression for similarity between two stimuli, based on comparisons to reference shapes, can be used to derive models of perceived similarity ranging from continuous, symmetric, and hierarchical, as in the multidimensional scaling models (Shepard, 1980), to discrete and non-hierarchical, as in the general contrast models (Tversky, 1977; Shepard and Arabie, 1979). (shrink)
Are minds really dynamical or are they really symbolic? Because minds are bundles of computations, and because computation is always a matter of interpretation of one system by another, minds are necessarily symbolic. Because minds, along with everything else in the universe, are physical, and insofar as the laws of physics are dynamical, minds are necessarily dynamical systems. Thus, the short answer to the opening question is “yes.” It makes sense to ask further whether some of the computations that constitute (...) a human mind are constrained by functional, algorithmic, or implementational factors to be essentially of the discrete symbolic variety (even if they supervene on an apparently continuous dynamical substrate). I suggest that here too the answer is “yes” and discuss the need for such discrete, symbolic cognitive computations in communication-related tasks. (shrink)
A standing challenge for the science of mind is to account for the datum that every mind faces in the most immediate – that is, unmediated – fashion: its phenomenal experience. The complementary tasks of explaining what it means for a system to give rise to experience and what constitutes the content of experience (qualia) in computational terms are particularly challenging, given the multiple realizability of computation. In this paper, we identify a set of conditions that a computational theory must (...) satisfy for it to constitute not just a sufficient but a necessary, and therefore naturalistic and intrinsic, explanation of qualia. We show that a common assumption behind many neurocomputational theories of the mind, according to which mind states can be formalized solely in terms of instantaneous vectors of activities of representational units such as neurons, does not meet the requisite conditions, in part because it relies on inactive units to shape presently experienced qualia and implies a homogeneous representation space, which is devoid of intrinsic structure. We then sketch a naturalistic computational theory of qualia, which posits that experience is realized by dynamical activity-space trajectories (rather than points) and that its richness is measured by the representational capacity of the trajectory space in which it unfolds. (shrink)
The problem of representing the spatial structure of images, which arises in visual object processing, is commonly described using terminology borrowed from propositional theories of cognition, notably, the concept of compositionality. The classical propositional stance mandates representations composed of symbols, which stand for atomic or composite entities and enter into arbitrarily nested relationships. We argue that the main desiderata of a representational system — productivity and systematicity — can (indeed, for a number of reasons, should) be achieved without recourse to (...) the classical, proposition-like compositionality. We show how this can be done, by describing a systematic and productive model of the representation of visual structure, which relies on static rather than dynamic binding and uses coarsely coded rather than atomic shape primitives. (shrink)
A computational theory of consciousness should include a quantitative measure of consciousness, or MoC, that (i) would reveal to what extent a given system is conscious, (ii) would make it possible to compare not only different systems, but also the same system at different times, and (iii) would be graded, because so is consciousness. However, unless its design is properly constrained, such an MoC gives rise to what we call the boundary problem: an MoC that labels a system as conscious (...) will do so for some – perhaps most – of its subsystems, as well as for irrelevantly extended systems (e.g., the original system augmented with physical appendages that contribute nothing to the properties supposedly supporting consciousness), and for aggregates of individually conscious systems (e.g., groups of people). This problem suggests that the properties that are being measured are epiphenomenal to consciousness, or else it implies a bizarre proliferation of minds. We propose that a solution to the boundary problem can be found by identifying properties that are intrinsic or systemic: properties that clearly differentiate between systems whose existence is a matter of fact, as opposed to those whose existence is a matter of interpretation (in the eye of the beholder). We argue that if a putative MoC can be shown to be systemic, this ipso facto resolves any associated boundary issues. As test cases, we analyze two recent theories of consciousness in light of our definitions: the Integrated Information Theory and the Geometric Theory of consciousness. (shrink)
Lasnik’s review of the Minimalist program in syntax  offers cognitive scientists help in navigating some of the arcana of the current theoretical thinking in transformational generative grammar. One may observe, however, that this journey is more like a taxi ride gone bad than a free tour: it is the driver who decides on the itinerary, and questioning his choice may get you kicked out. Meanwhile, the meter in the cab of the generative theory of grammar is running, and has (...) been since the publication of Chomsky’s Syntactic Structures in 1957. The fare that it ran up is none the less daunting for the detours made in his Aspects of Theory of Syntax in 1965, Government and Binding in 1981, and now The Minimalist Program, in 1995. Paraphrasing Winston Churchill, it seems like never in the ﬁeld of cognitive science was so much owed by so many of us to so few. For most of us in the cognitive sciences this situation will appear quite benign, if we realize that it is the generative linguists who should by rights be paying this bill. The reason for that is simple and is well known in the philosophy of science: putting forward a theory is like taking out a loan, to be repayed by gleaning an empirical basis for it; theories that fail to do so are declared bankrupt. In the sciences of the mind, this maxim translates into the need to demonstrate the psychological, and, eventually, the neurobiological, reality of the theoretical constructs. Many examples of this process can be found in the study of human vision, where, as in language, direct observation of the underlying mechanisms is difﬁcult; for instance, the concept of multiple parallel spatial frequency channels, introduced in the late 1960s, was completely vindicated by purely behavioral means over the following decade; see, e.g., . In linguistics, the nature of the requisite evidence is well described by Townsend and Bever: “What do we test today if we want to explore the behavioral implications of syntax?. (shrink)
To learn a visual code in an unsupervised manner, one may attempt to capture those features of the stimulus set that would contribute signiﬁcantly to a statistically eﬃcient representation. Paradoxically, all the candidate features in this approach need to be known before statistics over them can be computed. This paradox may be circumvented by conﬁning the repertoire of candidate features to actual scene fragments, which resemble the “what+where” receptive ﬁelds found in the ventral visual stream in primates. We describe a (...) single-layer network that learns such fragments from unsegmented raw images of structured objects. The learning method combines fast imprinting in the feedforward stream with lateral interactions to achieve single-epoch unsupervised acquisition of spatially localized features that can support systematic treatment of structured objects . (shrink)
We describe a uniﬁed framework for the understanding of structure representation in primate vision. A model derived from this framework is shown to be effectively systematic in that it has the ability to interpret and associate together objects that are related through a rearrangement of common “middle-scale” parts, represented as image fragments. The model addresses the same concerns as previous work on compositional representation through the use of what+where receptive ﬁelds and attentional gain modulation. It does not require prior exposure (...) to the individual parts, and avoids the need for abstract symbolic binding. (shrink)
A view is put forward, according to which various aspects of the structure of the world as internalized by the brain take the form of “neural spaces,” a concrete counterpart for Shepard's “abstract” ones. Neural spaces may help us understand better both the representational substrate of cognition and the processes that operate on it. [Shepard].
Construction-based approaches to syntax (Croft, 2001; Goldberg, 2003) posit a lexicon populated by units of various sizes, as envisaged by (Langacker, 1987). Constructions may be speciﬁed completely, as in the case of simple morphemes or idioms such as take it to the bank, or partially, as in the expression what’s X doing Y?, where X and Y are slots that admit ﬁllers of particular types (Kay and Fillmore, 1999). Constructions offer an intriguing alternative to traditional rule-based syntax by hinting at (...) the extent to which the complexity of language can stem from a rich repertoire of stored, more or less entrenched (Harris, 1998) representations that address both syntactic and semantic issues, and encompass, in addition to general rules, “totally idiosyncratic forms and patterns of all intermediate degrees of generality” (Langacker, 1987, p.46). Because constructions are by their very nature language-speciﬁc, the question of acquisition in Construction Grammar is especially poignant. We address this issue by offering an unsupervised algorithm that learns constructions from raw corpora. (shrink)
Language is a rewarding ﬁeld if you are in the prediction business. A reader who is ﬂuent in English and who knows how academic papers are typically structured will readily come up with several possible guesses as to where the title of this section could have gone, had it not been cut short by the ellipsis. Indeed, in the more natural setting of spoken language, anticipatory processing is a must: performance of machine systems for speech interpretation depends critically on the (...) availability of a good predictive model of how utterances unfold in time (Baker, 1975; Jelinek, 1990; Goodman, 2001), and there is strong evidence that prospective uncertainty affects human sentence processing too (Jurafsky, 2003; Hale, 2006; Levy, 2008). The human ability to predict where the current utterance is likely to be going is just another adaptation to the general pressure to anticipate the future (Hume, 1748; Dewey, 1910; Craik, 1943), be it in perception, thinking, or action, which is exerted on all cognitive systems by evolution (Dennett, 2003). Look-ahead in language is, however, special in one key respect: language is a medium for communication, and in communication the most interesting (that is, informative) parts of the utterance that the speaker is working through are those that cannot be predicted by the listener ahead of time. (shrink)
Nearest-neighbor correlation-based similarity computation in the space of outputs of complex-type receptive elds can support robust recognition of 3D objects. Our experiments with four collections of objects resulted in mean recognition rates between 84% and 94%, over a 40 40 range of viewpoints, centered on a stored canonical view and related to it by rotations in depth. This result has interesting implications for the design of a front end to an arti cial object recognition system, and for the understanding of (...) the faculty of object recognition in primate vision. (shrink)
Idealized mo dels of receptive elds (RFs) can be used as building blocks for the creation of p owerful distributed computation systems. The present rep ort concentrates on inv estigating the utility of collections of RFs in representing 3D objects under changing viewing conditions. The main requirement in this task is that the pattern of activity of RFs vary as little as p ossible when the object and the camera move relative to each other. I propose a method for representing (...) objects by RF activities, based on the observation that, in the case of rotation around a xed axis, dierences of activities of RFs that are prop erly situated with resp ect to that axis remain inv ariant. Results of computational exp eriments suggest that a representation scheme based on this algorithm for the choice of stable pairs of RFs would p erform consistently b etter than a scheme inv olving random sets of RFs. The proposed scheme may be useful under object or camera rotation, b oth for ideal Lam b ertian objects, and for real-world objects such as human faces. (shrink)
We describe a method for automatic word sense disambiguation using a text corpus and a machine- readable dictionary (MRD). The method is based on word similarity and context similarity measures. Words are considered similar if they appear in similar contexts; contexts are similar if they contain similar words. The circularity of this deﬁnition is resolved by an iterative, converging process, in which the system learns from the corpus a set of typical usages for each of the senses of the polysemous (...) word listed in the MRD. A new instance of a polysemous word is assigned the sense associated with the typical usage most similar to its context. Experiments show that this method performs well, and can learn even from very sparse training data. (shrink)
Shanahan’s eloquently argued version of the global workspace theory ﬁts well into the emerging understanding of consciousness as a computational phenomenon. His disinclination toward metaphysics notwithstanding, Shanahan’s book can also be seen as supportive of a particular metaphysical stance on consciousness — the computational identity theory.
What insights does comparative biology provide for furthering scienti¿ c understanding of the evolution of dynamic coordination? Our discussions covered three major themes: (a) the fundamental unity in functional aspects of neurons, neural circuits, and neural computations across the animal kingdom; (b) brain organization –behavior relationships across animal taxa; and (c) the need for broadly comparative studies of the relationship of neural structures, neural functions, and behavioral coordination. Below we present an overview of neural machinery and computations that are shared (...) by all nervous systems across the animal kingdom, and the related fact that there really are no “simple” relationships in coordination between nervous systems and the behavior they produce. The simplest relationships seen in living organisms are already fairly complex by computational standards. These realizations led us to think about ways that brain similarities and differences could be used to produce new insights into complex brain–behavior phenomena (including a critical appraisal of the roles of cortical and noncortical structures in mammalian behavior), and to think brieÀy about how future studies could best exploit comparative methods to elucidate better general principles underlying the neural mechanisms associated with behavioral coordination. In our view, it is unlikely that the intricacies interrelating neural and behavioral coordination are due to one particular manifestation (such as neural oscillation or the possession of a six-layered cortex). Instead of considering the human cortex to be the standard against which all things are measured (and thus something to crow about), both broad and focused comparative studies on behavioral similarities and differences will be necessary to elucidate the fundamental principles underlying dynamic coordination. (shrink)
differentiaily rated pairwise similarity when confronted with two pairs of objects, each revolving in a separate window on a computer screen. Subject data were pooled using individually weighted MDS (ref. 11; in all the experiments, the solutions were consistent among subjects). In each trial, the subject had to select among two pairs of shapes the one consisting of the most similar shapes. The subjects were allowed to respond at will; most responded within 10 sec. Proximity (that is, perceived similarity) tables (...) derived from the judgments were processed to verify their degree of transitivity (4% of all triplets were found intransitive) and then submitted to MDS. In the long-term memory (LTM) variant of this experiment, the subjects were first trained to associate a label (a three-letter nonsensical string, such as "BON" or "POM") with each object and then carried out the pairs of pairs comparison task from memory, prompted by the object labels rather than by the objects themselves. Six subjects participated in each of the two LTM experiments (Star and Triangle). The subjects were taught each shape in a separate session and had to discriminate between that shape and six similar nontargets from various viewpoints. Training continued until the recognition rate reached 90%, over a period of several days. The subjects were never exposed to more than one target in one session and were not told the ultimate purpose of the experiment. After 2 to 3 days of rest, they were tested with questions such as: "is the BON more similar to POM than TOC to ROX?", for all pairs of pairs of stimuli. In the LTM experiments, 8% of the.. (shrink)
Visual objects can be represented by their similarities to a small number of reference shapes or prototypes. This method yields low-dimensional (and therefore computationally tractable) representations, which support both the recognition of familiar shapes and the categorization of novel ones. In this note, we show how such representations can be used in a variety of tasks involving novel objects: viewpoint-invariant recognition, recovery of a canonical view, estimation of pose, and prediction of an arbitrary view. The unifying principle in all these (...) cases is the representation of the view space of the novel object as an interpolation of the view spaces of the reference shapes. (shrink)
Although computational considerations suggest that a resource-limited memory system may have to trade oﬀ capacity for generalization ability, such a trade-oﬀ has not been demonstrated in the past. We describe a simple model of memory that exhibits this trade-oﬀ and describe its performance in a variety of tasks.
Unsupervised statistical learning is the standard setting for the development of the only advanced visual system that is both highly sophisticated and versatile, and extensively studied: that of monkeys and humans. In this extended abstract, we invoke philosophical observations, computational arguments, behavioral data and neurobiological ﬁndings to explain why computer vision researchers should care about (1) unsupervised learning, (2) statistical inference, and (3) the visual brain. We then outline a neuromorphic approach to structural primitive learning motivated by these considerations, survey (...) a range of neurobiological ﬁndings and behavioral data consistent with it, and conclude by mentioning some of the more challenging directions for future research. (shrink)
We outline an unsupervised language acquisition algorithm and offer some psycholinguistic support for a model based on it. Our approach resembles the Construction Grammar in its general philosophy, and the Tree Adjoining Grammar in its computational characteristics. The model is trained on a corpus of transcribed child-directed speech (CHILDES). The model’s ability to process novel inputs makes it capable of taking various standard tests of English that rely on forced-choice judgment and on magnitude estimation of linguistic acceptability. We report encouraging (...) results from several such tests, and discuss the limitations revealed by other tests in our present method of dealing with novel stimuli. (shrink)
We compare our model of unsupervised learning of linguistic structures, ADIOS [1, 2, 3], to some recent work in computational linguistics and in grammar theory. Our approach resembles the Construction Grammar in its general philosophy (e.g., in its reliance on structural generalizations rather than on syntax projected by the lexicon, as in the current generative theories), and the Tree Adjoining Grammar in its computational characteristics (e.g., in its apparent afﬁnity with Mildly Context Sensitive Languages). The representations learned by our algorithm (...) are truly emergent from the (unannotated) corpus data, whereas those found in published works on cognitive and construction grammars and on TAGs are hand-tailored. Thus, our results complement and extend both the computational and the more linguistically oriented research into language acquisition. We conclude by suggesting how empirical and formal study of language can be best integrated. (shrink)
Beer ’s paper devotes much energy to buttressing the walls of Castle Dynamic and dredging its moat in the face of what some of its dwellers perceive as a besieging army chanting “no cognition without representation”. The divide is real, as attested by the contrast between titles such as “Intelligence without representation” and “In defense of representation”, to pick just one example from each side. It is, however, not too late for people from both sides of the moat to meet (...) on the drawbridge and see if all that energy can be put to a better use. The parley can be organized around an attempt to identify those attributes of representations that both sides may ﬁnd useful. In my view, one such attribute is the capacity for hierarchical abstraction, an aspect of the representational approach not actually mentioned by Beer. The capacity for abstraction is gained automatically by those theoretical frameworks that opt for the explanatory beneﬁts of representation in understanding cognition. Moreover, without appropriately structured mediating states, a cognitive system would be incapable of dealing with complex reality. Thus, hierarchical abstraction is useful not just for the cognitive scientists in their attempts to scale up their understanding of cognition: it is also indispensable for cognitive systems that aspire to scale up their understanding of the world. (shrink)
A metaphor that has dominated linguistics for the entire duration of its existence as a discipline views sentences as ediﬁces consisting of Lego-like building blocks. It is assumed that each sentence is constructed (and, on the receiving end, parsed) ab novo, starting (ending) with atomic constituents, to logical semantic speciﬁcations, in a recursive process governed by a few precise algebraic rules. The assumptions underlying the Lego metaphor, as it is expressed in generative grammar theories, are: (1) perfect regularity of what (...) Saussure called langue, (2) inﬁnite potential recursivity of syntactic structures, (3) unlimited human capacity for linguistic creativity, (4) the impossibility of acquiring structural knowledge from examples, and (5) the impossibility of such knowledge being stored in a memory-intensive form (ensembles of exemplars). (shrink)
We examined the role of ﬁtness, commonly assumed without proof to be conferred by the mastery of language, in shaping the dynamics of language evolution. To that end, we introduced island migration (a concept borrowed from population genetics) into the shared lexicon model of communication (Nowak et al., 1999). The effect of ﬁtness linear in language coherence was compared to a control condition of neutral drift. We found that in the neutral condition (no coherence-dependent ﬁtness) even a small migration rate (...) – less than 1% – sufﬁces for one language to become dominant, albeit after a long time. In comparison, when ﬁtness-based selection is introduced, the subpopulations stabilize quite rapidly to form several distinct languages. Our ﬁndings support the notion that language confers increased ﬁtness. The possibility that a shared language evolved as a result of neutral drift appears less likely, unless migration rates over evolutionary times were extremely small. (shrink)
(a) Learn a grammar GA for the source language (A). (b) Estimate a structural statistical language model SSLMA for (A). Given a grammar (consisting of terminals and nonterminals) and a partial sentence (sequence of terminals (t1 . . . ti)), an SSLM assigns probabilities to the possible choices of the next terminal ti+1.
An image of a face depends not only on its shape, but also on the viewpoint, illumination conditions, and facial expression. A face recognition system must overcome the changes in face appearance induced by these factors. This paper investigate two related questions: the capacity of the human visual system to generalize the recognition of faces to novel images, and the level at which this generalization occurs. We approach this problems by comparing the identi cation and generalization capacity for upright and (...) inverted faces. For upright faces, we found remarkably good generalization to novel conditions. For inverted faces, the generalization to novel views was signi cantly worse for both new illumination and viewpoint, although the performance on the training images was similar to the upright condition. Our results indicate that at least some of the processes that support generalization across viewpoint and illumination are neither universal (because subjects did not generalize as easily for inverted faces as for upright ones), nor strictly objectspeci c (because in upright faces nearly perfect generalization was possible from a single view, by itself insu cient for building a complete object-speci c model). We propose that generalization in face recognition occurs at an intermediate level that is applicable to a class of objects, and that at this level upright and inverted faces initially constitute distinct object classes. (shrink)
The computational program for theoretical neuroscience initiated by Marr and Poggio (1977) calls for a study of biological information processing on several distinct levels of abstraction. At each of these levels — computational (deﬁning the problems and considering possible solutions), algorithmic (specifying the sequence of operations leading to a solution) and implementational — signiﬁcant progress has been made in the understanding of cognition. In the past three decades, computational principles have been discovered that are common to a wide range of (...) functions in perception (vision, hearing, olfaction) and action (motor control). More recently, these principles have been applied to the analysis of cognitive tasks that require dealing with structured information, such as visual scene understanding and analogical reasoning. Insofar as language relies on cognition-general principles and mechanisms, it should be possible to capitalize on the recent advances in the computational study of cognition by extending its methods to linguistics. (shrink)