Features, Objects, and other Things: Ontological Distinctions in the Geographic Domain David M. Mark1, André Skupin2 and Barry Smith3, 1 Department of Geography, NCGIA and Center for Cognitive Science University at Buffalo, Buffalo, NY 14261. Email: dmark@geog.buffalo.edu 2 Department of Geography University of New Orleans, New Orleans, LA 70148. Email: askupin@uno.edu 3 Department of Philosophy, NCGIA and Center for Cognitive Science University at Buffalo, Buffalo, NY 14260. Email: phismith@buffalo.edu Abstract. Two hundred and sixty-three subjects each gave examples for one of five geographic categories: geographic features , geographic objects, geographic concepts, something geographic, and something that could be portrayed on a map. The frequencies of various responses were significantly different, indicating that the basic ontological terms feature, object, etc., are not interchangeable but carry different meanings when combined with adjectives indicating geographic or mappable. For all of the test phrases involving geographic, responses were predominantly natural features such as mountain, river, lake, ocean, hill. Artificial geographic features such as town and city were listed hardly at all for geographic categories, an outcome that contrasts sharply with the disciplinary self-understanding of academic geography. However, geographic artifacts and fiat objects, such as roads, cities, boundaries, countries, and states, were frequently listed by the subjects responding to the phrase something that could be portrayed on a map. In this paper, we present the results of these experiments in visual form, and provide interpretations and implications for further research. Keywords. Geographic ontology, geographic categories, prototypes, spatial cognition, mereotopology, human subjects testing, spatialization, selforganizing maps, geographic information systems, GIS. Introduction In a paper presented at COSIT '99, Mark et al. (1999) reported results of an experiment in which subjects were asked to list examples of "a kind of geographic feature" under controlled conditions. As the authors pointed out, the results of this experiment were somewhat puzzling: "'geographical feature' elicited solely natural and not artificial geographic features." Mountain, river, lake, ocean, hill were the examples most often listed, town and city being listed hardly at all. These results contrast sharply with the disciplinary self-understanding of academic geography, Preprint version of paper in Daniel Montello (ed.), Spatial Information Theory. Foundations of Geographic Information Science, Proceedings of COSIT 2001 (Lecture Notes in Computer Science 2205), Berlin/New York: Springer, 2001, 488–502. which has for more than a century been at least as much concerned with accounting for the results of human activities on geographic scales as with the natural, physical environment. The possibility was raised during discussion at the COSIT meeting that the puzzling result might be due to the choice of the word "feature," rather than to any beliefs concerning the nature of the geographic domain on the part of our subjects. In an attempt to separate the effects of the two terms "feature" and "geographic," we designed a series of new experiments, employing not only the stimulus "geographic feature" but also "geographic object", "geographic concept", "something geographic", and "something that could be portrayed on a map". The results of these experiments were themselves no less surprising, and are reported in detail in this paper. The Geographic Domain The principles of topology, geometry, and mereology by which the geographic domain is primarily structured are principles which operate independently of scale and size. On the other hand, converging lines of reasoning and observation suggest that geographic and non-geographic entities are ontologically distinct in a number of ways, and that these differences are not just a matter of scale (Smith and Mark 1998). That these ontological differences between tabletop and geographic phenomena are also cognitively salient has long been known in work on spatial cognition. Downs and Stea (1977) drew a distinction between 'small-scale' or 'perceptual' spaces such as those on tabletops or within rooms, and 'large scale' or 'geographic' spaces, which they also called 'transperceptual' spaces in order to emphasize that they are known by integrating across direct perceptual experiences. Mark and Freundschuh (1995) surveyed published arguments in favor of the existence of a multiplicity of cognitive scales, and Montello (1993) provided a solid review of scale-based kinds of spaces and spatial knowledge. The term "geographic" seems on the basis of these arguments, to select for a coherent subdomain of reality – and this thesis receives strong prima facie support from the existence of the academic discipline of geography, a "science which has for its object the description of the earth's surface, treating of its form and physical features, its natural and political divisions, the climate, productions, population, etc., of the various countries" (OED, 1989). For the existence of a scientific discipline has been held since Aristotle to require the existence of a corresponding, coherent domain of study: a somehow homogeneous or unified domain of objects. Our experiments on "geographic feature" were designed to test the degree to which such a domain is coded for in the conceptual systems of ordinary people. We now realize that there is a problem in carrying out such experiments, turning on the fact that there is no noun in English covering all and only the phenomena in the geographic domain (in the way in which, for example, the term 'chemical' picks out objects in the domain of chemistry or 'organism' picks out objects in the domain of biology). Because the geographic domain is captured in English by an adjective, problems arise in experimental design because, for purposes of testing human subjects, it is necessary that this adjective be part of a noun phrase. Further complicating the matter, it is not just any common or garden type of noun to which the modifier 'geographic' needs to be attached, but in fact the noun must be a noun of an ontological sort, a noun – like 'object' or 'thing' or 'entity' – which, as philosophers interested in ontology have long recognized, is likely to be far from innocent. These considerations were only implicit in the thinking that led us to test "a kind of geographic feature". Our experimental results then told us that when the domain of the geographic is coded in just this way, then it cannot be identified simply as the domain of entities on or near the surface of the earth which are just like nongeographic things except larger. For there are entire families of large and highly salient geospatial entities which our subjects simply did not list. Superordinate Terms "Geographic feature" is a node in our conceptual system which in a sense has two superordinates from which it may inherit characteristics: "geographic" and "feature". We have been interested in inferring the common-sense understanding of the domain of the "geographic" through an elicitation of examples, and we chose "feature" to be the accompanying noun without much thought. But because of the unexpected result that almost all examples elicited by "kind of geographic feature" were natural, we decided to explore further the possible effect of the superordinate "feature" on the responses elicited from our subjects by employing alternative superordinate terms. The experiment comparing examples for these different superordinates was expected to confirm that the choice of superordinate term would have little or no influence on the geographic examples listed. We decided to test: • a kind of geographic feature • a kind of geographic object • a geographic concept • something geographic • something that could be portrayed on a map (We plan in the future to carry out the same tests using entity and phenomenon.) As is reported below, the intuition that the ontological noun would have li ttle or no influence on subjects' responses turned out to be incorrect. Differences Related to Superordinate Terms: Basic Geographic Examples A major thrust of our geospatial ontology research project involves the collection of norms, which is to say: most common examples, of geographic categories. In order to obtain these norms, we employed an elicitation protocol used by Battig and Montague (1968). Two hundred and sixty-three subjects from an introductory course on world civilization were each asked to give examples of 9 categories, given 30 seconds per category. One fifth of the subjects were each given one of five versions of the survey, with different wordings of the questions. (For more details on the experimental design and subjects, see Smith and Mark, 2001). More frequently listed examples for the basic geographic question are given in Table 1. A Chi-squared test revealed that response frequencies were significantly different for the various phrasings. Visualizing Subject Test Results In Smith and Mark (2001) we presented an analysis of the relative frequencies of subjects' responses to these different phrasings on the basis of a series of tables 1 Within each row, the column having the highest value is indicated in bold type. constructed according to which terms achieved the highest frequencies of elicitation under each phrasing. Greater insight can be achieved, however, if some method can be found to arrange all the elicited terms (mountain, river, etc.) within a single visual space in a way that makes apparent the relative degree to which subjects associate them with our different phrasings. Table 1: Frequency of Terms Elicited by Five Phrasings 1 Term feature object something concept map Total N subjects 54 56 51 51 51 263 mountain 48 23 32 23 25 151 river 35 18 26 19 31 129 lake 33 13 25 10 21 102 ocean 27 16 18 16 18 95 hill 20 9 11 3 0 43 country 2 6 8 4 23 43 sea 9 8 9 11 5 42 city 1 4 5 0 30 40 continent 1 10 8 9 12 40 valley 21 7 4 7 0 39 plain 19 6 5 4 1 35 plateau 17 4 6 8 0 35 map 0 17 11 7 0 35 road 1 2 3 1 27 34 island 8 7 7 7 3 32 peninsula 8 10 5 6 1 30 desert 14 6 6 4 0 30 state 0 5 3 1 15 24 volcano 10 4 5 3 0 22 forest 6 4 5 1 3 19 land 2 6 6 5 0 19 globe 0 11 4 0 0 15 town 0 5 2 0 8 15 stream 6 2 2 3 1 14 rock 1 6 3 2 0 12 delta 4 1 0 6 0 11 compass 0 8 0 1 2 11 street 0 1 1 1 8 11 atlas 0 6 2 2 0 10 Self-Organizing Maps Among techniques for the exploratory analysis of multidimensional data, selforganizing maps (SOMs) have received particular attention in recent years. They are also known as Kohonen maps, in recognition of their inventor, Teuvo Kohonen (1989, 1995). In their original and most widely used form, SOMs are created through an unsupervised artificial neural network procedure. The inputs to the procedure are a multivariate data set and a regularly tessellated, two-dimensional field of what are called 'neurons,' which is to say: nodes of an artificial neural network (Kohonen 1995). While squares and hexagons are common two-dimensional shapes, hexagons are generally preferred, as they display less directional bias (Kohonen 1995). In attribute space, each neuron is associated with a multivariate vector, with values for each of the input variables referred to as a component plane. In this study, we have five component planes, one for each of the five different phrasings of our test question. Once the data have been entered into the system, the network of artificial neurons is trained by providing information about inputs, in this case the data from our human subjects. During the training stage, the values for the input variables are gradually adjusted, in an attempt to preserve neighborhood relationships that exist within the input data set. The results can be displayed in a number of ways. Visualization of all the neuron weights for a particular input variable, i.e. component plane, such as frequency of subject responses to "a kind of geographic feature", can reveal information about the relative importance of a variable within the data set as well as about its relationship to other variables; the diagrams in Figures 1, 2, and 4 are based on this approach. In the case of truly high-dimensional data sets with several hundred input dimensions only the most dominant component planes can be visually inspected; but since our data involve only five input variables, all can be displayed and compared rather easily. Exploration and understanding of complex data can also be enhanced by visualizing statistical clusters within the SOM, based on weight vectors of all neurons; this is how Figures 3 and 5 were constructed. The visualizations shown in this paper were produced using Viscovery SOMine, for SOM training and initial labeling of neurons, and ArcView, for processing of base map configurations and two-dimensional interpolation. Visualization of the Results of the Elicitation Task Subject Test Table 1, above, shows a portion of the data set that resulted from the elicitation test. Rows correspond to the terms (mountain, river, etc.) offered by subjects in response to the different elicitation questions. Columns represent the five different ways in which the question was phrased. Values in the table indicate the number of subjects who listed the given term as an example of that phrasing. For preprocessing, the row sum of these raw counts is computed. Since computations on terms with a small row sum are very sensitive to small variations in the subjects' choices of examples, only terms with a row sum of at least 10 are shown in Table 1 and considered for further processing. For each of the 31 terms the five raw counts are divided by the row sum. The resulting value can be interpreted as the relative proportion with which each of the five questions was positively answered for the respective term. This transformation makes it easier to perform computations involving both high-frequency terms, such as mountain (row sum: 151), and lowfrequency terms, such as peninsula (row sum: 30). The preprocessed data are used to train a SOM and to label those neurons that are best matched by the 31 input terms. The two-dimensional positions of these neurons are then used as a base map onto which attributes such as the raw frequencies for "a kind of geographic object" or other stimulus phrases can be mapped. This use of a consistent geometric configuration makes it easier to interpret the ensuing visualizations. Visualization Based on Raw Frequency Counts The first of these visualizations uses the raw counts for each of the five phrasings to create five interpolations (Figure 1). When absolute numbers of subjects listing each term are mapped as attributes in the visualization, the overall pattern is similar for all phrasings, with an area of high frequency centered on the term mountain. Frequencies are higher and more spatially concentrated in the visualization of the "feature" component than in the others, suggesting that "geographic feature" is more strongly associated with a clear category in the minds of the subjects than are the other phrasings, since higher intensity for a given term on the diagram indicates that a greater absolute number of subjects listed that term in response to that question. For the "feature" component, higher values are restricted to elements of the natural environment. Manipulable artifacts with a geographic purpose, such as map, globe, atlas, and compass, are concentrated in the lower left portions of the diagrams and have relatively higher frequency under the superordinate term 'object'. Artifacts of geographic scale (roads, streets, cities) plus fiat administrative things such as states and countries are found in the upper left corner of the spatialization, and have relatively high frequency for "something that can be portrayed on a map". Clearly, mappability and geographicalness are distinct concepts within our subject pool. Visualization Based on Relative Frequencies The second visualization (Figure 2 below) is based on the relative proportion of responses for each question, rather than on absolute frequencies. It aims at visualizing the relative dominance of the five phrasings for each of the 31 terms that make up the base map. The more strongly subjects associated a phrasing with a given term, the darker the corresponding region of the visualization. Here the 'feature' surface is the most muted, since most terms that peak as examples of a kind of geographic feature are also listed (at lower frequencies) under other phrasings. For "object" and "mappable", in contrast, there are subsets of terms that are much more concentrated under just one phrasing. Figure 1. Experiment 1: Visualization of Raw Counts of Positive Responses. Figure 2. Experiment 1: Visualization of Relative Proportions of Positive Responses. Hierarchical Clustering Solutions The third visualization shows the results of a hierarchical clustering procedure for the multivariate neuron vectors of the SOM (Figure 3). Three levels of the cluster solution are shown, at 5, 10, and 15 clusters, respectively. Boundaries between clusters are delineated and individual neurons are shaded according to how dissimilar they are from the multivariate cluster centroid. This visualization makes it easier to recognize groupings of terms, with respect to the relative dominance of the five phrasings. For example, in the five cluster solution elements of the man-made environment can be distinguished from those of the natural environment and from manipulable objects. In the ten and fifteen cluster solutions subgroupings are apparent, such as the more conceptual, less mappable terms, like "plateau", "plain", and "valley." Figure 3. Experiment 1: Hierarchical Clustering Solution for SOM Neurons. Three Cluster Levels Shown (5, 10, 15 Clusters). Lighter shading indicates greater similarity to the centroid of the cluster. Visualization of the Results of a Test Rating Goodness of Examples It is not necessarily the case that the most frequent examples of a category are also the best examples of that category. Rosch (1973) predicted such a correlation, based on a radial model of internal category structure, and tested it for several categories. She took examples given at relatively high and low frequencies by Battig and Montague's (1968) subjects, and had them rate the degree to which they were good examples of that category. In addition to replicating Bat tig and Montague's experiments but for geographic categories, Mark and Smith conducted another test based on Rosch's method. Subjects were given a list of 32 terms (such as mountain, river) and a category label (such as geographic feature, geographic object), and they were asked to estimate for each of the terms how well it belonged to the given category. For example: On a scale of 1 (excellent example) to 7 (poor example), please rate each of the items on the next page (listed in alphabetical order) regarding how good an example it is of "something that could be portrayed on a map". This was repeated for the five phrasings used in the first experiment, using different but comparable subjects. Results of the second subject test were preprocessed by scaling the original 1-7 agreement values to a 0-1 range and by averaging across subjects for each term. After this rescaling, a term unanimously considered to be an excellent example would have a score of 1.00, and one unanimously considered to be a poor example would have a score of 0.00. These re-scaled scores were then used to train a SOM whose five component planes again correspond to the five different questions. The neurons that best match the 32 terms are labeled by those terms. The ensuing interpolation is based on the rescaled frequencies (Figure 4). These five component visualizations are much simpler in structure (basically inclined planes with height increasing from lower left to upper right), and much more similar than the solutions presented in figures 1 and 2. Evidently, when the terms are supplied to the subject, and there is no strict time limit on responses, the nouns (object, feature, etc.) in the five phrasings do not lead to great differences in the ways in which subjects rate examples for acceptability as category members. As with the first subject test, a hierarchical clustering solution is computed for neurons of the SOM and visualized at three different cluster levels, to allow interpretation of multivariate clusters on the two-dimensional base map (Figure 5). The distinction of the built-up environment from the natural environment is again apparent, as is the clustering of terms with generally low 'geographic' association, such as "iceberg", "tree", or "cloud." Figure 4. Experiment 2: Visualization of Agreement Measure (Scaled to 0-1 Range). Figure 5. Experiment 2: Hierarchical Clustering Solution for SOM Neurons. Three Cluster Levels Shown (5, 10, 15 Clusters). Lighter shading indicates greater similarity to the centroid of the cluster. Conclusions When combined with the adjective "geographic", the nouns "feature", "object", and "something" lead subjects to offer different families of examples. The phrase "something that can be portrayed on a map" produces yet another set of examples. Spatialization of the data using self-organizing maps (SOMs) assists in comprehending the different ways in which ontological specifications influence cognition of the geographic domain. The SOMs allow us to see the arrangement of geographic kinds in a space based on conceptual similarity. Within this space, "feature" produces high frequency for landforms, water bodies, watercourses, and other natural phenomena. "Ability to be portrayed on a map" leads to higher frequencies for fiat objects and elements of the built environment (so that the domain of the "mappable" in fact comes closest to what is studied by academic geographers). The term "object" appears to have such a strong connotation of manipulability that it invokes many examples of manipulable objects with a geographic purpose. Diagrams for the results of the elicitation (Figures 1 and 2) and agreement (Figure 4) tasks have very different structures, suggesting previously undetected systematic differences between salience and typicality. Future work is needed to determine whether these patterns hold for other languages and other conceptual domains. Acknowledgments This material is part of a project "Geographic Categories: An Ontological Investigation" supported by the U. S. National Science Foundation under Grant No. BCS-9975557. Support of the National Science Foundation is gratefully acknowledged. References Battig, W. F., and Montague, W. E., 1968. Category Norms for Verbal Items in 56 Categories: A replication and extension of the Connecticut Norms. Journal of Experimental Psychology Monograph, 80: 3, Part 2, 1-46. Downs, R. M. and Stea, D., 1977. Maps in Minds: Reflections on Cognitive Mapping. New York: Harper and Row. Kohonen, T., 1989. Self-Organization and Asscociative Memory. Berlin: SpringerVerlag. Kohonen, T., 1995. Self-Organizing Maps, Heidelberg: Springer-Verlag. Mark, D. M., 1993. Toward a Theoretical Framework for Geographic Entity Types. In Frank, A. U., and Campari, I, editors, Spatial Information Theory: A Theoretical Basis for GIS, Berlin: Springer-Verlag, Lecture Notes in Computer Sciences No. 716, pp. 270-283. Mark, D. M., and Freundschuh, S. M., 1995. Spatial Concepts and Cognitive Models for Geographic Information Use. In Nyerges, T. L., Mark, D. M., Laurini, R., and Egenhofer, M., editors. Cognitive Aspects of Human-Computer Interaction for Geographic Information Systems, Dordrecht: Kluwer Academic Publishers, 21-28. Mark, D. M., Smith, B., and Tversky, B., 1999. Ontology and Geographic Objects: An Empirical Study of Cognitive Categorization. In Freksa, C., and Mark, D. M., editors, Spatial Information Theory: A Theoretical Basis for GIS , Berlin: Springer-Verlag, Lecture Notes in Computer Science No. 1661, 283-298. Montello, D. R., 1993. Scale and Multiple Psychologies of Space. In Frank, A. U., and Campari, I, editors, Spatial Information Theory: A Theoretical Basis for GIS , Berlin/Heidelberg/New York, etc.: Springer, pp. 312-321 OED, 1989. Oxford English Dictionary, Second Edition. http://dictionary.oed.com/ Rosch, E., 1973. On the internal structure of perceptual and semantic categories. in: T. E. Moore (ed.), Cognitive Development and the Acquisition of Language, New York, Academic Press. Smith, B., 1995. On Drawing Lines on a Map. In Andrew U. Frank and Werner Kuhn (eds.), Spatial Information Theory. A Theoretical Basis for GIS . Berlin/Heidelberg/New York, etc.: Springer, 475-484. Smith, B., and Mark, D. M., 1998. Ontology and Geographic Kinds. Proceedings, Eighth International Symposium on Spatial Data Handl ing , 308-320, http://www.geog.buffalo.edu/ncgia/i21/SDH98.html. Smith, B., and Mark, D. M. 1999. Ontology with Human Subjects Testing: An Empirical Investigation of Geographic Categories. American Journal of Economics and Sociology, 58: 2, 245–272. Smith, B., and Mark, D. M., 2001. Geographic Categories: An Ontological Investigation, International Journal of Geographical Information Science, forthcoming.