Towards a Reference Terminology for Ontology Research and Development in the Biomedical Domain 1Barry Smith, Ph.D., 2Waclaw Kusnierczyk, M.D., 3Daniel Schober, Ph.D., 1Werner Ceusters, M.D. 1Center of Excellence in Bioinformatics and Life Sciences, Buffalo NY/USA 2Department of Computer Computer and Information Science, NTNU,Trondheim, Norway 3European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK Ontology is a burgeoning field, involving researchers from the computer science, philosophy, data and software engineering, logic, linguistics, and terminology domains. Many ontology-related terms with precise meanings in one of these domains have different meanings in others. Our purpose here is to initiate a path towards disambiguation of such terms. We draw primarily on the literature of biomedical informatics, not least because the problems caused by unclear or ambiguous use of terms have been there most thoroughly addressed. We advance a proposal resting on a distinction of three levels too often run together in biomedical ontology research: 1. the level of reality; 2. the level of cognitive representations of this reality; 3. the level of textual and graphical artifacts. We propose a reference terminology for ontology research and development that is designed to serve as common hub into which the several competing disciplinary terminologies can be mapped. We then justify our terminological choices through a critical treatment of the 'concept orientation' in biomedical terminology research. PREAMBLE Ever since the invention of the computer, scientists and engineers have been exploring ways of 'modeling' or 'representing' the entities about which machines are expected to reason. But what do 'modeling' and 'representing' mean? What is a 'conceptual model' or an 'information model' and how can they and their components be unambiguously described? Two questions here arise: To what do expressions such as 'concept', 'information', 'knowledge', etc. precisely refer? And what is it to 'model' or 'represent' such things? If information and knowledge themselves consist in representations, then what could an information representation or a knowledge representation be? There is, to say the least, some suspicion of redundancy here. As we have argued elsewhere, the term 'concept' is marked in a peculiarly conspicuous manner by problems in this regard.1 But the problem of multiple conflicting meanings arises also in regard to other terms, such as 'class', 'object', 'instance', 'individual', 'property', 'relation', etc., all of which have established, but unfortunately non-uniform, meanings in a range of different disciplines. Among philosophical ontologists, the term 'instance' means an individual (for example this particular dog Fido), which is an instance of a corresponding universal or kind (dog, mammal, etc.). In OWL, 'instance' means 'element' or 'member' of a class (where 'class' means 'general concept, category or classification ... that belongs to the class extension of owl:Class'2). Standardization agencies such as ISO, CEN and W3C have been of little help in engendering crossdisciplinary uniformity in the use of such terms, since their standards are themselves directed towards specific communities. Standardization efforts under the auspices of W3C or UML or Dublin Core, too, have not addressed these problems. For while OWLDL, for example, has a rigorously defined semantics,3 this does not by any means guarantee that an ontology formulated using OWL-DL is an error-free representation of its intended domain, and nor – until the day when the use of OWL or of some successor becomes uniform common practice – will it do anything to resolve the problems of semantic ambiguity adverted to in the above. In the domain of biomedical informatics a number of attempts have been made to resolve these problems4,5,6 in light of an increasing recognition that many ambitious terminological systems developed in this field are marked by unclarity over what, precisely, they have been designed to achieve. Are biomedical controlled vocabularies 'concept representations' or 'knowledge models'? And if they are either of these things, how, if at all, do they relate to the reality – the tumors, diseases, treatments, chemical interactions – on the side of the patient? OBJECTIVES AND METHODS The purpose of this communication is to initiate a process for resolving such problems by drawing on the best practices in ontology which are now beginning to take root through the efforts of KR-MED 2006 "Biomedical Ontology in Action" November 8, 2006, Baltimore, MD, CEUR, Vol. 222, 57-65 57 organizations such as the National Center for Biomedical Ontology,7 the Open Biomedical Ontologies (OBO) Consortium,8 the OBO Foundry,9 and others.10 What is needed is a set of terms referring in unambiguous fashion to the different kinds of entities surveyed above, which can serve as common target for mappings from other disciplineand computational idiom-centric terminologies, thereby mediating efficient pairwise translations between these terminologies themselves. Our strategy is to advance precision via clear informal definitions rooted in what we assume are commonly accepted intuitions, providing references to associated formal treatments where possible. In selecting terms we have sometimes chosen expressions precisely because they have not been used by others and hence do not have established (and potentially conflicting) meanings. In other cases we have adapted existing terms to our purposes by providing them with more precise definitions or (in case of primitive terms) elucidations. These proposals are focused primarily on the ontology-related needs of natural science, including the clinical basic sciences, though we believe them to be of quite general applicability. We start out from a distinction of three levels of entities which have a role to play wherever ontologies are used: • Level 1: the objects, processes, qualities, states, etc. in reality (for example on the side of the patient); • Level 2: cognitive representations of this reality on the part of researchers and others; • Level 3: concretizations of these cognitive representations in (for example textual or graphical) representational artifacts. This tripartite distinction will awaken echoes of the Semantic Triangle of Ogden and Richards, to which we return in the sequel. For present purposes we note that the indispensability of Level 1 reflects the fact that even those who see themselves as building for example 'data models' in the domain of the life sciences are attempting to create thereby artifacts which stand in some representational relation to entities in the real world. Level 2 reflects the fact that a crucial role is played in ontology and terminology development by the cognitive representations of human subjects. Level 3 reflects the fact that cognitive representations can be shared, and serve scientific ends, only when they are made communicable in a form whereby they can also be subjected to criticism and correction, and also to implementation in software. Note that the three levels overlap; thus the textual and graphical artifacts distinguished in Level 3 are themselves objects on Level 1. Our talk of 'levels' should thus be interpreted by analogy with talk of 'levels of granularity': if we have apprehended all the liquid in a vessel, then in a sense we have thereby apprehended also all the molecules. Yet for scientific purposes molecules and liquids must be distinguished nonetheless, and the same applies, for the purposes of clarity in our thinking about ontologies, to the three levels delineated in the above. FOUNDATIONS Here we give precise definitions to a number of central terms, which will then be used in conformity thereto in the remainder of the paper. Really existing ontologies and related artifacts are typically constructed to realize a mixture of different sorts of ends (terminologies, for example, to support clinical record keeping and large-scale epidemiological studies, and to serve as controlled vocabularies for the expression of research results). Hence they typically combine the features of artifacts of different basic types. Our reference terminology is designed to reflect these basic types. Hence the definitions we propose for terms such as 'ontology' or 'class' do not imply any claim to the effect that everything called an 'ontology' or 'class' in the literature exhibits just the characteristics referred to in the definition.. An ENTITY is anything which exists, including objects, processes, qualities and states on all three levels (thus also including representations, models, beliefs, utterances, documents, observations, etc.) A REPRESENTATION is for example an idea, image, record, or description which refers to (is of or about), or is intended to refer to, some entity or entities external to the representation. Note that a representation (e.g. a description such as 'the cat over there on the mat') can be of or about a given entity even though it leaves out many aspects of its target. A COMPOSITE REPRESENTATION is a representation built out of constituent sub-representations as their parts, in the way in which paragraphs are built out of sentences and sentences out of words. The smallest constituent sub-representations are called REPRESENTATIONAL UNITS; examples are: icons, names, simple word forms, or the sorts of alphanumeric identifiers we might find in patient records. Note that many images are not composite representations since they are not built out of smallest representational units in the way in which molecules are built out of atoms. (Pixels are not representational units in the sense defined.) If we take the graph-theoretic concretization of the Gene Ontology11 as our example, then the representational units here are the nodes of the graph (taken to comprehend terms and unique IDs), which are intended to refer to corresponding entities in reality. But the composite representation refers, 58 through its graph structure, also to the relations between these entities, so that there is reference to entities in reality both at the level of single units and at the structural level.12 A COGNITIVE REPRESENTATION (Level 2) is a representation whose representational units are ideas, thoughts, or beliefs in the mind of some cognitive subject – for example a clinician engaged in applying theoretical (and practical) knowledge to the task of establishing a diagnosis. A REPRESENTATIONAL ARTIFACT (Level 3) is a representation that is fixed in some medium in such a way that it can serve to make the cognitive representations existing in the minds of separate subjects publicly accessible in some enduring fashion. Examples are: a text, a diagram, a map legend, a list, a clinical record, or a controlled vocabulary. Clearly such artifacts can serve to convey more or less adequately the underlying cognitive representations – and can be correspondingly more or less intuitive or understandable. Because representational artifacts such as SNOMED CT give textual form to cognitive representations which pre-exist them, some have taken this to mean that these artifacts are in fact made up of representations which refer to (are of or about) these cognitive representations (the 'concepts') from out of which the latter are held to be composed. We shall argue below that this reflects a deep confusion, and that the constituent units of representational artifacts developed for scientific purposes should more properly (and more straightforwardly) be seen as referring to the very same entities in reality – the diseases, patients, body parts, and so forth – to which the underlying cognitive representations of clinicians and others refer. Such artifacts are in this respect no different from scientific textbooks. They are windows on reality, designed to serve as a means by which representations of reality on the part of cognitive agents can be made available to other agents, both human and machine. A simple phrase, such as 'the cat over there on the mat', can be used to refer more or less successfully to what is, in reality, a portion of reality of a highly complex sort – and the same applies to all of the types of artifacts referred to above. The window on reality which each provides is, to be sure, in every case from a certain perspective and in such a way as to embody a certain granularity of focus. Yet the entities to which it refers are full-fledged entities in reality nonetheless – the very same, full-fledged entities in reality with which we are familiar also in other ways, for example because they provide us with food or companionship. REALITY The clinician is concerned first and foremost with PARTICULARS in reality (Level 1), (in the vernacular also called 'tokens' or 'individuals'), that is to say with individual patients, their lesions, diseases, and bodily reactions, divided into CONTINUANTS and OCCURRENTS.13 Some particulars, such as human beings, planets, ships, hurricanes, receive PROPER NAMES (they may also receive unique identifiers, such as social security numbers) which are used in representational artifacts of various sorts. But we can refer to particulars also by means of complex expressions – that man on the bench, this oophorectomy, this blood sample – involving GENERAL TERMS of different sorts, including: i. General terms such as 'apoptosis', 'fracture', 'cat', which represent structures or characteristics in reality which are exemplified – the very same structures or characteristics; over and over again – in an open-ended collection of particulars in arbitrarily disconnected regions of space and time. Consider for example the way in which a certain DNA structure is instantiated as a transcript (RNA-structure) over and over again in cells of our body. ii. General terms such as 'danger', 'gift', 'surprise', which draw together entities in reality which share common characteristics which are not intrinsic to the entities in question. iii. General terms such as 'Berliner', 'Paleolithic', which relate to specific collections of particulars tied to specific regions of space and time. General terms of the first sort refer to UNIVERSALS (in the vernacular also called 'types' or 'kinds'). A universal is something that is shared in common by all those particulars which are its INSTANCES. The universal itself then exists in Level 1 reality as a result of existing in its particular instances. When a clinician says 'A and B have the same disease', she is referring to the universal; when she says 'A's diabetes is more advanced than B's,' then she is referring to the respective instances. It is overwhelmingly universals which are the entities represented in scientific texts, and a good prima facie indication that a general term 'A' refers to a universal is that 'A' is used by scientists for purposes of classificiation and to make different sorts of law-like assertions about the individual instances of A with which they work in the lab or clinic. <universal, universal> nose part_of body <particular, particular> Mary's nose part_of Mary <particular, universal> Mary's nose instance_of nose Table 1 – Three Basic Sorts of Binary Relation Both particulars and universals stand to each other in various RELATIONS. Thus particulars stand to the corresponding universals in the relation of 59 INSTANTIATION. This and other binary relations (of parthood, adjacency, derivation) used in biomedical ontologies13 can be divided into groups as in Table 1, which uses Roman for particulars, bold type for relations involving particulars, and italics for universals and for relations between universals. A COLLECTION OF PARTICULARS (of molecules in John's body, of pieces of equipment in a certain operating theater, of operations performed in this theater over a given period of months) is a Level 1 particular comprehending other particulars as its MEMBERS.14 We note that confusion is spawned by the fact that we can use the very same general terms to refer both to universals and to collections of particulars. Consider: • HIV is an infectious retrovirus • HIV is spreading very rapidly through Asia A CLASS is a collection of all and only the particulars to which a given general term applies. Where the general term in question refers to a universal, then the corresponding class, called the EXTENSION of the universal (at a given time), comprehends all and only those particulars which as a matter of fact instantiate the corresponding universal (at that time). The totality of classes is wider than the totality of extensions of universals since it includes also DEFINED CLASSES, designated by terms like 'employee of Swedish bank', 'daughter of Finnish spy'. Languages like OWL are ideally suited to the formal treatment of such classes, and the popularity of OWL has encouraged the view that it is classes which are designated by the general terms in terminologies. (OWL classes are not, however, identical with classes in the usual set-theoretic sense on which we draw also here.) Some OWL classes (above all Thing and Nothing) are 'primitive' (which means: not defined), and these classes are sometimes asserted to constitute an OWL counterpart of universals ('natural kinds') in the sense here defined.15 Because OWL identifies the relation of instantiation with that of membership, however, it in effect identifies universals with their extensions. Through relations of greater and lesser generality both classes and universals are organized into trees, the former on the basis of the subclass relation, the latter on the basis of the is_a relation (whereby, again, in the OWL framework the two relations are identified). Because the instances of more specific universals are ipso facto also instances of the corresponding more general universals, the latter hierarchy is, when viewed extensionally, a proper part of the former. As we shall discuss further in our treatment of the argument from borderline cases below, it is difficult to draw a sharp line between terms designating universals and those designating defined classes. This does not mean, however, that the distinction is of no import. Indeed we believe that taking account of this distinction is indispensable to creating an path to improvement of ontologies.16 We use the term PORTION OF REALITY to comprehend both single universals and particulars and their more or less complex combinations. Some portions of reality – for example single organisms, planets – reflect autonomous joints of reality (that is, they would exist as separate entities even in a world denuded of cognitive subjects). Other portions of reality are products of fiat demarcations of one or other sort,17 as when we delineate a portion of reality by focusing on some specific granular level (of molecules, or molecular processes), or on some specific family of universals (for example when we view the human beings living in a given county in light of their patterns of alcohol consumption). A DOMAIN is a portion of reality that forms the subject-matter of a single science or technology or mode of study; for example the domain of proteomics, of radiology, of viral infections in mouse. Representational artifacts will standardly represent entities in domains delineated by level of granularity. Thus entities smaller than a given threshold value may be excluded from a domain because they are not salient to the associated scientific or clinical purposes.18 REPRESENTATIONAL ARTIFACTS In developing theories, biomedical researchers seek representations of the universals existing in their respective domain of reality. They first develop cognitive representations, which they then transform incrementally into representational artifacts of various sorts. In developing diagnoses, and in compiling such diagnoses into clinical records, clinicians seek a representation of salient particulars (diseases, disease processes, drug effects) on the side of their patients. Drawing on their theoretical understanding of the universals which these particulars instantiate (which in turn draws on prior representations formed in relation to earlier particulars19), they first develop a cognitive representation of what is taking place within a given collection of particulars in reality, which they then transform into representational artifacts such as clinical documents, entries in databases, and so forth, which may then foster more refined cognitive representations in the future. The mentioned representations are typically built up out of sub-representations each of which, in the best case, mirrors a corresponding salient portion of reality. The most simple representations ('blood! ') mirror universals or particulars taken singly; more complex representations – such as therapeutic schemas, diagnostic protocols, scientific texts, pathway diagrams – mirror more complex portions of 60 reality, their constituent sub-representations being joined together in ways designed to mirror salient relations on the side of reality. In the ideal case a representation would be such that all portions of reality salient to the purposes for which it was constructed would have exactly one corresponding unit in the representation, and every unit in the representation would correspond to exactly one salient portion of reality.19 Unfortunately, in a domain like biomedicine, ideal case will likely remain forever beyond our grasp. Researchers working on the level of universals may fall short by creating representations which either (i) fail to include general terms for universals which are salient to their domain, or (ii) include general terms which do not in fact denote any universals at all. Similarly, clinicians working on the level of particulars may fall short of the best case by creating misdiagnoses, either (i) by failing to acknowledge particulars which do exist and which are salient to the health of a given patient, or (ii) by using representational units assumed to refer to particulars where no such particulars exist. A TAXONOMY is a tree-form graph-theoretic representational artifact with nodes representing universals or classes and edges representing is_a or subset relations. An ONTOLOGY is a representational artifact, comprising a taxonomy as proper part, whose representational units are intended to designate some combination of universals, defined classes, and certain relations between them.13 A REALISM-BASED ONTOLOGY is built out of terms which are intended to refer exclusively to universals, and corresponds to that part of the content of a scientific theory that is captured by its constituent general terms and their interrelations. A TERMINOLOGY is a representational artifact consisting of representational units which are the general terms of some natural language used to refer to entities in some specific domain. An INVENTORY is a representational artifact built out of singular referring terms such as proper names or alphanumeric identifiers. Electronic Health Records (EHRs) incorporate inventories in this sense, including both terms denoting particulars ('patient #347', 'lung #420') and more complex expressions involving terms designating universals and defined classes ('the history of cancer in patient #347's family').20 In the best case, again, each of the representational artifacts listed above (ontologies, taxonomies, inventories) will be such that its representational units stand in a one-to-one correspondence with the salient entities in its domain. In practice, however, such artifacts can be classified on the basis of the various ways in which they fall short of this best case, in terms of properties such as correctness, degree of structural fit, degree of completeness and degree of redundancy.16,18 By exploiting such classifications we can measure the quality improvements made in successive versions, and also use such measures as a basis for further improvement.20 To make a representation interpretable by a computer, it must be published in a language with a formal semantics and so converted into a FORMALIZED REPRESENTATION. The choice of language will depend on the complexity of what one needs to express and on the sorts of reasoning one needs to perform. While OWL, for example, can cope well with defined classes, it may not have sufficient expressive power to meet the needs of ontologies in the life sciences domain. Thus it seems to be incapable, for example, of capturing the relations involved even in simple interactions among pluralities of continuants, or of capturing the changes which take place in such continuants (for example growth of a tumor) over time.21,22 Most inventories in the biomedical field (including most EHRs) have still exploited hardly at all the powers of formal reasoning. The paradigm of Referent Tracking represents an exception to this rule,20 since it involves precisely the embedding of a highly structured representation of particulars in a formalized representation of the corresponding universals. THE CONCEPT ORIENTATION We believe that ontologies, inventories and similar artifacts should consist exclusively of representational units which are intended to designate entities in Level 1 reality. Defenders of the concept orientation in medical terminology development have offered a series of arguments against this view, to the effect that such terminologies should include also (or exclusively) representational units referring to what are called 'concepts'.23 First, is what we can call the argument from intellectual modesty, which asserts that it is up to domain experts, and not to terminology developers, to answer for the truth of whatever theories the terminology is intended to mirror. Since domain experts themselves disagree, a terminology should embrace no claims as to what the world is like, but reflect, rather, the coagulate formed out of the concepts used by different experts. Against this, it can be pointed out that communities working on common domains in the medical as in other scientific fields in fact accept a massive and ever-growing body of consensus truths about the entities in these domains. Many of these truths are, admittedly, of a trivial sort (that mammals have hearts, that organisms are made of cells), but it is precisely such truths which form the core of science61 based ontologues. Where conflicts do arise in the course of scientific development, these are highly localized, and pertain to specific mechanisms, for example of drug action or disease development, which can serve as the targets of conflicting beliefs only because researchers share a huge body of presuppositions. We can think of no scenario under which it would make sense to postulate special entities called 'concepts' as the entities to which terms subject to scientific dispute would refer. For either, for any such term, the dispute is resolved in its favor, and then it is the corresponding level 1 entity that has served as its referent all along; or it is established that the term in question is non-designating, and then this term is no longer a candidate for inclusion in a terminology. We cannot solve the problem that we do not know, at some given stage of scientific inquiry, to which of these groups a given term belongs, by providing such terms instead with guaranteed referents called 'concepts'. It may, finally, be the case that it is not the disputed term itself which is at issue, but rather some more complex expression, as when we talk about 'G. E. Stahl's concept of phlogiston', but that the latter refers to some entity – a concept – in (psychological) reality is precisely not subject to scientific dispute. Sometimes the argument from intellectual modesty takes an extreme form, for example on the part of those for whom reality itself is seen as being somehow unknowable ('we can only ever know our own concepts'). Arguments along these lines are of course familiar from the history of philosophy. Stove provides the definitive refutation.24 Here we need note only that they run counter not just to the successes, but to the very existence, of science and technology as collaborative endeavors. Second, is the argument from creativity. Designer drugs are conceived, modeled, and described long before they are successfully synthesized, and the plans of pharmaceutical companies may contain putative references to the corresponding chemical universals long before there are instances in reality. But again: such descriptions and plans can be perfectly well apprehended even within terminologies and ontologies conceived as relating exclusively to what is real. Descriptions and plans do, after all, exist. On the other hand it would be an error to include in a scientific ontology of drugs terms referring to pharmaceutical products which do not yet (and may never) exist, solely on the basis of plans and descriptions. Rather, such terms should be included precisely at the point where the corresponding instances do indeed exist in reality, exactly in accordance with our proposals above. Third, is what we might call the argument from unicorns. Some of the terms needed in medical terminologies refer, it is held, to what does not exist. Some patients do, after all, believe that they are James Bond, or that they see unicorns. The realist approach is however perfectly well able to comprehend also phenomena such as these, even though it is restricted to the representation of what is real. For the beliefs and hallucinatory episodes in question are of course as real as are the persons who suffer (or enjoy) them. And certainly such beliefs and episodes may involve concepts (in the properly psychological sense of this term). But they are not about concepts, they do not have concepts as their targets – for they are intended by their subjects to be about entities in flesh-and-blood external reality. Fourth, is the argument from medical history. The history of medicine is a scientific pursuit; yet it involves use of terms such as 'diabolic possession' which, according to the best current science, do not refer to universals in reality. But again: the history of medicine has as its subject-domain precisely the beliefs, both true and false, of former generations (together with the practices, institutions, etc. associated therewith). Thus a term like 'diabolic possession' should be included in the ontology of this discipline in the first place as component part of terms designating corresponding classes of beliefs. In addition it may appear also as part of a term designating some fiat collection of those diseases from which the patients diagnosed as being possessed were in fact suffering. The evolution of our thinking about disease can then be understood in the same way that we deal with theory change in other parts of science, as a reordering of our beliefs about the ontological validity and salience of specific families of terms – and once again: concepts themselves play no role as referents.20,26 Fifth, is the argument from syndromes. The subject-matters of biology and medicine are, it is held, replete with entities which do not exist in reality but are rather convenient abstractions. A syndrome such as congestive heart failure, for example, is nothing more than a convenient abstraction, used for the convenience of physicians to collect together many disparate and unrelated diseases which have common final manifestations. Such abstractions are, it is held, mere concepts. According to the considerations on fiat demarcations advanced above, however, syndromes, pathways, genetic networks and similar phenomena are indeed fully real – though their reality is that of defined (fiat) classes rather than of universals. A similar response can be given also in regard to the many human-dependent delineations used in expressions like 'obesity' or 'hypertension' or 'abnormal curvature of spine'. These terms, too, refer to entities in reality, namely to defined classes which rest on fiat thresholds established by consensus among physicians. 62 Sixth is the argument from error. When erroneous entries are entered into a clinical record and interpreted as being about level 1 entities, then logical conflicts can arise. For Rector et al., this implies that the use of a meta-language should be made compulsory for all statements in the EHR, which should be, not about entities in reality, but rather about what are called 'findings'.25 Instead of p and not p, the record would contain entries like: McX observed p and O'W observed not p, so that logical contradiction is avoided. The terms in terminologies devised to serve such EHRs would then one and all refer not to diseases themselves, but rather to mere 'concepts' of diseases. This, however, blurs the distinction between entities in reality and associated findings, and opens the door to the inclusion in a terminology of problematic findings-related expressions such as SNOMED's 'absent nipple', 'absent leg', etc. Certainly clinicians need to record such findings. But then their findings are precisely that a leg is absent; not that a special kind of ('absent') leg is present. In the domain of scientific research we do not embargo entirely the making of object-language assertions simply because there might be, among the totality of such assertions, some which are erroneous. Rather, we rely on the normal workings of science as a collective, empirical endeavor to weed out error over time, providing facilities to quarantine erroneous entries and resolve logical conflicts as they are identified. We have argued elsewhere that these same devices can be applied also in the medical context.26 The argument for the move to the meta-level is sometimes buttressed by appeal to medico-legal considerations seen as requiring that the EHR be a record not of what exists but of clinicians' beliefs and actions. Yet the forensic purposes of an audit trail can equally well be served by an object-language record if we ensure that meta-data are associated with each entry identifying by whom the pertinent data were entered, at what time, and so forth. On the other side, moreover, even the move to meta-level assertions would not in fact solve the problems of error, logical contradiction and legal liability. For the very same problems arise not only when human beings are describing, on the objectlevel, fractures, or pulse rates, or symptoms of coughing or swelling, but also on the meta-level when they are describing what clinicians have heard, seen, thought and done. The latter, too, are subject to error, fraud, and disagreement in interpretation. Seventh is the argument from borderline cases. As we have already noted above, there is at any given stage no bright line between those general terms properly to be conceived as designating universals and those designating merely 'concepts' (or defined classes). Certainly there are, at any given stage in the development of science, clear cases on either side: 'electron' or 'cell', on the one hand, and 'fall on stairs or ladders in water transport NOS, occupant of small unpowered boat injured' (Read Codes) on the other. But there are also borderline cases such as 'alcoholic non-smoker with diabetes', or 'age-dependent yeast cell size increase', which call into question the very basis of the distinction. In response, we note first the general point, that arguments from the existence of borderline cases in general have very little force. For otherwise they would allow us to prove from the existence of people with borderline complements of hair that there is no such thing as baldness or hairiness. As to the specific problem of how to classify borderline expressions, this is a problem not for terminology, but rather for empirical science. For borderline terms of the sorts mentioned will, as an inevitable concomitant of scientific advance, be in any case subjected to a filtering process based on whether they are needed for purposes of (for example therapeutically) fruitful classifications, and thus for the expression of scientific laws. Science itself is thereby subject to constant update. A term taken to refer to a universal by one generation of scientists may be demoted to the level of nondesignating term ('phlogiston') by the next. This means also that representational artifacts of the sorts considered in the above, because they form an integral part of the practice of science, should themselves be subject to continual update in light of such advance. But again: we can think of no circumstance in which updating of the sort in question would signify that phlogiston is itself a concept, or that some expression was at one or other stage being used by scientists with the intention of referring to 'concepts' rather than to entities in reality. THE SEMIOTIC TRIANGLE Finally is what we might call the argument from multiple perspectives. Different patients, clinicians and biologists have their own perspectives on one and the same reality. To do justice to these differences, it is argued, we must hold that their respective representations point, not to this common reality, but rather to their different 'concepts' thereof. This argument has its roots in the work of Ogden and Richards, and specifically in their discussion of the so-called 'semiotic triangle', which is of importance not least because it embodies a view of meaning and reference that still plays a fateful role in the terminology standardization work of ISO.26 As Figure 1 makes clear, the triangle in fact refers not to 'concepts', but rather to what its authors call 'thought or reference',27 reflecting the fact that Ogden and Richards' account is rooted in a theory of psychological causality. When we experience a 63 certain object in association with a certain sign, then memory traces are laid down in our brains in virtue of which the mere appearance of the same sign in the future will, they hold, 'evoke' a 'thought or reference' directed towards this object through the reactivation of impressions stored in memory. The two solid edges of the triangle are intended to represent what are held to be causal relations of 'symbolization' (roughly: evocation), and 'reference' (roughly: perception or memory) on the part of a symbol-using subject. The dashed edge, in contrast, signifies that the relation between term and referent – the relation that is most important for the discussion of terminology – is merely 'imputed'. The background assumption here is that multiple perspectives are both ubiquitous and (at best) only locally and transiently resolvable. The meanings words have for you or me depend on our past experiences of uses of these words in different kinds of contexts. Ambiguity must be resolved anew (and a new 'imputed' relation of reference spawned) on each successive occasion of use. From this, Ogden and Richards infer that a symbolic representation can never refer directly to an object, but rather only indirectly, via a 'thought or reference' within the mind. It is a depsychologized version of this latter thesis which forms the basis of the concept orientation in contemporary terminology research. The terms in terminologies refer not to entities in reality, it is held, but rather to 'concepts' in a special 'realm'. The latter are not transparent mediators of reference; rather they are its targets, and the job of the terminologist is to callibrate his list of terms in relation not to reality but to this special 'realm of concepts'.26 The relation between terms in a terminology and the reality beyond becomes hereby obscured. Reality exists, if at all, only behind a conceptual veil – and hence familiar confusions according to which for example the concept of bacteria would cause an experimental model of disease, or the concept of vitamin would be 'essential in the diet of man'.28 'CONCEPTS' AND 'MODELS' How, then, should 'concept' be properly treated in the terminology literature henceforth? There are of course sensible uses of this term, for example in the literature of psychology. In the terminology literature, however, 'concept' has been used in such a bewildering variety of confused and confusing ways that we recommend that it be avoided altogether. It is tempting to suppose that, when considered extensionally, all of the mentioned alternative readings come down to one and the same thing, namely to an identification of 'concept' with what we have earlier called 'defined class'. If 'concept' could be used systematically in this way in terminological circles, then this would, indeed, constitute progress of sorts, though the question would then arise why 'defined class' itself should not be used instead. Unfortunately, however, the proposal in question stands in conflict with the fact that 'concept' is used by its adherents to comprehend also putative referents even for terms – such as 'surgical procedure not carried out because of patient's decision' – which do not designate defined classes because they designate nothing at all. Here again, we believe, a proper treatment would involve appeal to appropriate fiat classes, defined in terms of utterances, interrupted plans, expectations, etc. on the part of the subjects involved. What, now is to be said of terms such as 'concept model', 'knowledge representation', 'information model', and so forth referred to in our premble above? To the extent that concept-based terminological artifacts consist in representations not of the reality on the side of the patient but rather of the entities in some putative 'realm of concepts', the term 'concept model' may be justified. This term is indeed used by SNOMED CT in its own selfdescriptions, though given SNOMED's scientific goals, we believe that, on the basis of the arguments given above, it should be abandoned. Still more problematic is the term 'knowledge model' or 'knowledge representation' (GALEN). For in the absence of a reference to reality to serve as benchmark, what could motivate a distinction between knowledge and mere belief.19 And what, in the absence of a reference to reality, could motivate adding or deleting terms in successive versions of a terminology, if every term is in any case guaranteed a reference to its own specially tailored 'concept'. As to 'information model', here one standard uncertainty concerns the relation between an entity in reality and the body of information used to 'represent' this entity in some information system. Is it information which is being 'modeled' in an information model, or the reality which this information is about? The documentation of the HL7 Reference Information Model (RIM)29 adds extra layers of uncertainty by conceiving its principal formulas as referring to the acts in which entities are observed for Figure 1 – Ogden and Richards' Semiotic Triangle 64 example in a clinical context. Simultaneously, however, it conceives these formulas as referring also to the documentation of such acts for example in an information system. The apparent contradiction is to some degree resolved by the RIM on the basis of its assertion that there is in any case 'no distinction between an activity and its documentation'.30 CONCLUSION Drawing on our distinction of the three levels of reality, cognition and representational artifact we have sought to formulate an unambiguous terminology for describing ontologies and related artifacts. The proposed terminology allows us to characterize more precisely the sorts of things which go wrong when the distinction between these levels is ignored, or when one or other level is denied, so that the approach may also help in improving such artifacts in the future. Acknowledgements This work was supported by the Wolfgang Paul Program of the Humboldt Foundation, the Volkswagen Foundation, the European Union Semantic Mining Network, by BBSRC Grant BB/D524283/1, and by the NIH Roadmap Grant U54 HG004028. Thanks are due also to Jim Cimino, Chris Chute, Gunnar Klein, Alan Rector, Stefan Schulz, and Kent Spackman for fruitful discussions. References (URLs last accessed July 1, 2006) 1. Smith B. Beyond concepts, or: Ontology as reality representation, Formal Ontology in Information Systems (FOIS 2004), p. 73-84. 2. http://www.w3.org/2003/glossary. 3. Patel-Schneider PF, Hayes P, Horrocks I. OWL Web Ontology Language. 2004. http://www.w3.org/TR/owl-semantics. 4. Spackman KA, Reynoso G. Examining SNOMED from the perspective of formal ontological principles. Workshop on Formal Biomedical Knowledge Representation (KRMED 2004), p. 72-80. 5. Johansson I. Bioinformatics and biological reality. J Biomed Inform. 2006;39(3):274-87. 6 Klein GO, Smith B. Concept systems and ontologies. http://ontology.buffalo.edu/concepts /ConceptsandOntologies.pdf. 7. http://ncbo.us/. 8. http://obo.sourceforge.net/. 9. http://obofoundry.org/. 10. Rosse C, Mejino JL, Jr. A reference ontology for biomedical informatics: the Foundational Model of Anatomy. J Biomed Inform 2003;36:478-500. 11. http://geneontology.org/. 12. Wittgenstein L. 1921 Tractatus LogicoPhilosophicus, London: Routledge, 1961. 13 Smith B, Ceusters W, Klagges B et al.. Relations in biomedical ontologies. Genome Biol, 2005;6(5):R46. 14. Bittner T, Donnelly M, Smith B. Individuals, universals, collections. Formal Ontology in Information Systems (FOIS 2004), p. 37-48. 15. Drummond N. Introduction to ontologies. http:// www.cs.man.ac.uk/~drummond/presentations/Int roductionToOWL50mins.ppt. 16. Ceusters W, Smith B. A realism-based approach to the versioning and evolution of biomedical ontologies. Proc AMIA Symp 2006, in press. 17. Smith B. Fiat objects. Topoi, 2001;20(2):131-48. 18. Bittner T, Smith B. A theory of granular partitions. Foundations of Geographic Information Science, London, 2003, p. 117-51 19. Smith B. From concepts to clinical reality, J Biomed Inform. 2006 Jun;39(3):288-98. 20. Ceusters W, Smith B. Strategies for referent tracking in Electronic Health Records. J Biomed Inform. 2006 Jun;39(3):362-78. 21. Bera P, Wand Y. Analyzing OWL using a philosophy-based ontology. Formal Ontology in Information Systems (FOIS 2004), p. 353-62. 22. Kazic T. Putting semantics into the semantic web: How well can it capture biology? Pac Symp Biocomputing 2006;11:140-51. 23 Cimino JJ. In defense of the desiderata. J Biomed Inform. 2006;39:299-306. 24. Franklin J. Stove's discovery of the worst argument in the world. Philosophy 2002;77:61524. www.maths.unsw.edu.au/~jim/worst.pdf. 25. Rector A, Nolan W, Kay S. Foundations for an electronic medical record. Methods Inf Med, 1991;30:179-86. 26. Smith B, Ceusters W, Temmerman R. Wüsteria, Stud Health Technol Inform. 2005;116:647-652. 27. Ogden CK, Richards IA. The Meaning of Meaning. 3rd ed. New York, 1930. 28. The UMLS Semantic Network. http://semantic network.nlm.nih.gov/. 29. HL7 V3 Reference Information Model: Version V 01-20. Normative Ballot 11/22/2005. 30. Smith B, Ceusters W. HL7 RIM: An incoherent standard, Proc MIE, 2006, p. 133-138