Chapter 4: New Desiderata for Biomedical Terminologies Barry Smith Part I: Introducing Concepts I.1. Introduction The terminologies used in biomedical research, clinical practice, and health information management today grew out of the medical dictionaries of an earlier era. Such dictionaries, of course, were created to be used by human beings, and the early steps towards standardization of terminologies in the 1930s were designed, above all, to enable clear understanding of terms in different languages; for example, on the part of those engaged in gathering data on an international scale. With the increasing importance of computers, however, came the recognition that standardization of terminology must go beyond the needs of humans, and it is especially in the biomedical domain – with terminologies such as SNOMED (see SNOMED 2007) and controlled vocabularies such as the Gene Ontology (see Gene Ontology 2007) – that the power of formal representation of terminological knowledge has been explored most systematically. The need for such formal, computer-processable representations becomes all the more urgent with the enormous increase in the amounts and varieties of data with which biomedical researchers are confronted, data which can no longer be surveyed without the aid of powerful informatics tools. I.2. The Concept Orientation Unfortunately, the new formalized biomedical terminologies were developed against the background of what are now coming to be recognized as a series of major and minor philosophical errors. Very roughly, the developers of terminologies made the assumption that we cannot have knowledge of the real world, but only of our thoughts. Therefore, they inferred, it is thoughts to which our terms (and our terminologies) necessarily refer – thoughts which, as we shall see, were understood as being crystallized in the form of what were called concepts. What the term 'concept' might precisely mean, however, was never clearly expressed, and it takes some considerable pains to extract a coherent reading of this term from the standard terminological literature. In fact, four loose families of readings can be distinguished, which we can refer to as the linguistic, the psychological, the epistemological, and the ontological. On the linguistic view, concepts are general terms whose meanings have been somehow regimented (or, as on some variants of the view, they are these meanings themselves). On the psychological view, concepts are mental entities analogous to ideas or beliefs. On the epistemological view, concepts are units of knowledge, such as your child's concept of a cat or of a square. And on the ontological view, concepts are abstractions of kinds or of properties (i.e., of general invariant patterns) belonging to entities in the world. As we will see in what follows, elements of all these views can be found, in various combinations, in the literature (Smith, 2004). The most influential biomedical terminologies, including almost all of the terminologies collected together in the Metathesaurus of the Unified Medical Language System (see National Library of Medicine), have been developed in the spirit of the concept orientation (Smith, 2005a). These terminologies have proved to be of great practical importance in the development of biomedical informatics. However, the ambiguities surrounding their use of the term 'concept' engenders problems which have been neglected in the informatics literature. As will become clear in what follows, the concept orientation exacerbates many of the problems which it was intended to solve, and introduces new problems of its own. I.2.1. The Birth of the Concept Orientation (I): Eugen Wüster and the International Organization for Standardization The concept orientation in terminology work goes back at least as far as the 1930s, when Eugen Wüster began to develop a theory of terms and concepts which later became entrenched as the terminology standard promulgated by the International Organization for Standardization (ISO) (ISO, N.D.; Smith, 2005b). Through the powerful influence of the ISO, Wüster's standard continues to be felt today wherever standardized terminologies are needed, not least in the areas of biomedicine and biomedical informatics. However, Wüster's standard was developed for terminologies used by humans; it does not meet the requirements placed on standardized terminologies in the era of the computer. In spite of this, the quasi-legal precedent-based policies of ISO – in which newer standards are required to conform as far as possible to those already established – have 84 prevented adequate adaptation of standards. Even the most recent ISO standards developed in the terminology domain betray a sloppiness and lack of clarity in their formulations which falls far short of meeting contemporary requirements. Human language is in constant flux. Focusing terminology development on the study of concepts was, for Wüster, a way of shouting 'Stop!' in the attempt to sidestep the tide of variances in human language use, which he saw as impediments to human communication across languages; for example (and uppermost in Wüster's own mind) in the context of international trade. (Wüster, himself, was a businessman and manufacturer of woodworking machinery.) Since the actual human thoughts associated with language use are an unreliable foundation upon which to base any system for standardizing the use of words, Wüster's solution was to effectively invent a new realm – the realm of concepts – in which the normal ebb and flow of human thought associated with the hitherto predominating term orientation would be somehow neutralized. Consider, for example, the way in which a term like 'cell' is used in different contexts to mean unit of life, a small enclosed space, a small militant group, unit in a grid or pigeonhole system, and so forth. From Wüster's point of view, there was a different concept associated with each of these contexts. Concepts, somehow, are crystallized out of the amorphous variety of different usages among the different groups of human beings involved. At the same time, Wüster defended a psychological view of these concepts – which means that he saw concepts as mental entities – sometimes writing as if, in order to apprehend concepts, we would need to gain access to the interiors of each other's brains (Wüster, 2003): If a speaker wishes to draw the attention of an interlocutor to a particular individual object, which is visible to both parties or which he carries with him, he only has to point to it, or, respectively, show it. If the object, however, is in another place, it is normally impossible to produce it for the purpose of showing it. In this case the only thing available is the individual concept of the object, provided that it is readily accessible in the heads of both persons. Thus, for Wüster a concept is an element of thought, existing entirely in the minds of human subjects. On this view, an individual concept (such as blood) is a mental surrogate of an individual object (such as the blood running through your veins); a general concept (such as rabbit or fruit) is a mental surrogate of a plurality of objects (Smith, 2005b). Individual concepts stand for objects which human beings are able to apprehend 85 through perceptual experience. General concepts stand for similarities between these objects. Both individual and general concepts are human creations, and the hierarchy of general concepts (from, say, Granny Smith to apple to fruit) arises as the cumulative reflection of the choices made by humans in grouping objects together. Since these choices will vary from one community to another, standardization is needed in order to determine a common set of general concepts to which terminologies would be related; for instance, in order to remove obstacles to international trade. The perceived similarities which serve as starting points for such groupings are reified by Wüster under the heading of what he calls 'characteristics', a term which, like the term 'concept', has been embraced by the terminology community (and, thereby, has also fallen prey to a variety of conflicting views). In some passages, Wüster himself seems happy to identify characteristics with properties on the side of the objects themselves. In others, however, he identifies them as further concepts, so that they too (incoherently) would exist in the heads of human beings (Smith, 2005b). Thus, Wüster's thought results in an uncomfortable straddling of the realms of mind (ideas and meanings) and world (objects and their properties). This fissure appears in Wüster's treatment of the extension of a concept as well, which he sometimes conceives in the standard way as the 'totality of all individual objects which fall under a given concept' (Smith, 2006; Wüster, 1979). Unfortunately, Wüster also allows a second reading of 'extension' as meaning 'the totality of all subordinated concepts'. So, on the one hand the extension of the concept pneumonia would be the totality of cases or instances of pneumonia; but, on the other hand, it would be a collection of more specific concepts (bacterial pneumonia, viral pneumonia, mycoplasma pneumonia, interstitial pneumonia, horse pneumonia, and so on). Another characteristic unclarity of Wüster's thinking is reflected in his definition of 'object' as 'anything to which human thought is or can be directed'. This definition has been given normative standing through its adoption in the relevant ISO standards, which similarly define 'object' as 'anything perceived or conceived' (ISO, 'Text for FDIS 704. Terminology work: Principles and methods'). This ISO definition implies that 'object' can embrace, in Wüsterian spirit, not only the material but also the immaterial, not only the real but also the 'purely imagined, for example, a unicorn, a philosopher's stone, or a literary character' (ISO, Information Technology for Learning, 86 Education, and Training; ISO, Vocabulary of Terminology). Given this characterization of 'object', we believe, ISO undercuts any view of the relation between concepts and corresponding objects in reality that might be compatible with the needs of empirical science (including the needs of contemporary evidence-based medicine). For its definition of 'object' would imply that the extension of the concept pneumonia should be allowed to include, not only your pneumonia and my pneumonia, but also, for example, cases of unicorn pneumonia or of pneumonia in Russian fiction. Of course there is nothing wrong with employing the term 'object' to mean, roughly, 'anything to which human thought can be directed'. The problem is that ISO allows no other term which would be used to distinguish those terms which are intended to be directed towards real things and those terms which merely refer to objects in this very loose sense. Matters are made even worse by ISO's edict that: [i]n the course of producing a terminology, philosophical discussions on whether an object actually exists in reality... are to be avoided. Objects are assumed to exist and attention is to be focused on how one deals with objects for the purposes of communication. (ISO, 'Text for FDIS 704') It is precisely such philosophical discussions which are required if we are to undo the sore effects of Wüster's influence. More recent ISO documents reveal efforts to increase clarity by embracing elements of a more properly ontological reading of the term 'concept', the view that concepts are abstractions of kinds which exist in the world. Unfortunately, however, in keeping with ISO's quasi-legal view of standards as enjoying some of the attributes of stare decisis, this is done in such a way that remnants of the older views are still allowed to remain. Thus, in ISO 1087-1:2000, 'concept' is defined variously as a 'unit of thought constituted through abstraction on the basis of properties common to a set of objects', or 'unit of knowledge created by a unique combination of characteristics', where 'characteristic' is defined as an 'abstraction of a property of an object or of a set of objects'. Since 'object' is still defined as 'anything perceivable or conceivable' (a unicorn still being listed by ISO as a specific example of the latter), the clarificatory effects of this move are, once again, rendered nugatory by the surrounding accumulation of inconsistencies. As Temmerman argues, Wüster's version of the concept orientation stands in conflict with many of the insights gained through research in cognitive science in recent years (Temmerman, 2000). His account of 87 concept learning and his insistence on the arbitrariness of conceptformation rest on ideas that have long since been called into question by cognitive scientists. Even very small children manifest, in surprisingly uniform ways, an ability to apprehend objects in their surroundings as instances of natural kinds in ways which go far beyond what they apprehend in perceptual experience. Thus, there is now much evidence (documented, for example, in Gelman, 1991) to the effect that our ability to cognize objects and processes in a domain like biology rests on a shared innate capacity to apprehend our surrounding world in terms of (invisible) underlying structures or powers (whose workings we may subsequently learn to comprehend; for example through inquiries in genetics). I.2.2. The Birth of the Concept Orientation (II): James Cimino's Desiderata By the time of James Cimino's important paper (Cimino, 1998), biomedical terminologies faced two major problems. The first problem concerned the legacy of the influential concept orientation as conceived by Wüster, which we will explore in greater depth in what follows. The upshot of this legacy was an endemic lack of precision, not only with regard to what concepts might be, but also with regard to their role in terminology work. The second problem revolved around the introduction of computers into the terminological domain. Computer-based applications rely on precision, in both syntax and semantics, in a way that human cognition does not. In an attempt to address these problems, James Cimino introduced a set of desiderata which must be satisfied by medical terminologies if they are to support modern computer applications. In what follows, we shall argue that many of Cimino's desiderata ought to be accepted by those involved in terminology work; but only when they have been subjected to radical reinterpretation. Cimino's principal thesis is that those involved in terminology work should focus their attentions, not on terms or words or their meanings, but rather on concepts. Unlike Wüster, Cimino comes close to embracing a linguistic rather than a psychological view of concepts. A concept, he says, is 'an embodiment of a particular meaning' (Cimino, 1998, p. 395), which means that it is something like a term that has been extricated from the flow of language so as not to change when the language does. One of his desiderata for a well-constructed medical terminology is accordingly that 88 of concept permanence: the meaning of a concept, once created, is inviolate. Three further desiderata are: Concepts which form the nodes of the terminology must correspond to at least one meaning (non-vagueness). Concepts must correspond to no more than one meaning (non-ambiguity). Meanings must themselves correspond to no more than one concept (nonredundancy). If these requirements are met, the preferred terms of a well-constructed terminology will be mapped in one-to-one fashion to corresponding meanings. (A preferred term is that term out of a set of synonyms which the terminology chooses to link directly to a definition.) On Cimino's view, a concept corresponds to a plurality of words and expressions that are synonymous with one another. However, Cimino recognizes that synonymy is not an equivalence relation dividing up the domain of terms neatly into disjoint sets of synonyms. Often, words which are synonyms relative to some types of context are not synonyms relative to others (e.g., a bat in a cave is not the same as a bat in a baseball game). To resolve this problem, he invokes the further desideratum of context representation, which requires a terminology to specify, formally and explicitly, the way in which a concept is used within different types of contexts. (We will leave open the question of whether, if concepts can be used differently in different contexts, this violates the non-ambiguity desideratum.) If, however, we are right in our view that concepts, for Cimino, are themselves (or correspond in one-toone fashion to) sets of synonyms, then concepts should thereby be relativized to contexts already. Thus, in formulating the desideratum of context representation he ought more properly to speak, not of concepts, but rather of terms themselves, as these are used in different types of contexts. If this is so, however, then his strategy for realizing the concept orientation requires that he take seriously that term orientation which predominated in early phases of terminology work; phases dominated by the concern with (printed) dictionaries, a concern which (if we understand Cimino's views correctly) the concept orientation was designed to do away with. Concepts understood as sets of synonyms, presumably, ought to be seen as standing in different kinds of meaning-relations: is narrower in meaning 89 than, is wider in meaning than, and so forth. Cimino, however, follows the usage now common in much work on biomedical terminologies in speaking of concepts as being linked together also by ontological relations, such as caused by, site of, or treated with (Cimino, 1998). As I am sure he would be the first to accept, sets of synonymous terms do not stand to each other in causal, locational, or therapeutic relations. In fact, by allowing the latter it seems that Cimino is embracing elements of an ontological view of concepts according to which concepts would be abstractions from entities in reality. I.2.3. The Ontological View and the Realist Orientation On the ontological view, concepts are seen as abstractions of kinds or properties in the real world. This view has advantages over the linguistic and psychological views of concepts when it comes to understanding many of the ways the terms in medical terminologies are, in fact, used by clinicians in making diagnoses. Clinicians refer to objects, such as blood clots and kidneys; properties which these objects have; and the kinds which they instantiate. Cimino, himself, tends toward the ontological view occasionally as, for example, when he refers to the concept diabetes mellitus becoming 'associated with a diabetic patient' (p. 399). Presumably, this association does not come about because the physician has the patient on his left, and the concept on his right, and decides that the two are fitted together to stand in some unspecified association relation. Rather, there is something about the patient, something in reality, which the clinician apprehends and which makes it true that this concept can be applied to this case. Fatefully, however, like other proponents of the concept orientation, Cimino does not address the ontological question of what it is on the side of the patient which would warrant the assertion that an association of the given sort obtains. In other words, he does not address the issue of what it is in the world to which concepts such as diabetes, type II diabetes, or endothelial dysfunction would correspond. The ontological view provides us with a means to understand how the corresponding terms can be associated directly with corresponding entities in the biomedical domain. It thereby opens up the question as to the purpose of fabricating concepts to stand in as proxies for those entities. Why should terms in terminologies refer indirectly to the world, when doctors and biologists are able to talk about the world directly? Of course, the original motivation for fabricating the conceptual realm on the part of 90 those such as Wüster was the belief that it was impossible to refer to the world directly. But this belief was based on a philosophical presupposition (still accepted today by an influential constituency among philosophers) to the effect that we have direct cognitive access only to our thoughts, not to entities in external reality. By contrast, scientists have never stopped referring to entities in the world directly and, on this basis, have succeeded in constructing theories with remarkable explanatory and predictive power which have undergirded remarkable technological and therapeutic advances. This is one major motivation for our promotion of the realist orientation, which we advance as a substitute for the concept orientation, not only because it eliminates the unclarities associated with the latter, but also because of its greater affinity with the methods of empirical science. On the realist orientation, when scientists make successful claims about the types of entities that exist in reality, they are referring to objectively existing entities which realist philosophers call universals or natural kinds. A universal can be multiply instantiated by, and is known through, the particular objects, processes, and so forth, which instantiate it. For example, the universal heart is instantiated by your heart and by the heart of every other vertebrate. Universals reflect the similarities at different levels of generality between the different entities in the reality which surround us; every heart is characterized by certain qualities exemplified by the universal heart, every heartbeat is characterized by certain qualities exemplified by the universal heartbeat, and so on. There is another motivation which we take as supporting a realist orientation. The concept orientation assumes that every term used in a terminology corresponds to some concept in reality and such correspondence is guaranteed; it applies as much to concepts such as unicorn or pneumonia in Russian fiction as to concepts such as heartbeat or glucose. However, many terms in medical terminologies are not associated with any universal. There are no universals corresponding, for example, to terms from ICD-9-CM such as: probable suicide possible tubo-ovarian abscess gallbladder calculus without mention of cholecystitis atypical squamous cells of uncertain significance, probably benign. Such terms do not represent entities in reality as they exist independently of our testing, measuring, and inquiring activities. Rather, as Bodenreider, et al. (2004) point out, they have the status of disguised 91 sentences representing our ways of gaining knowledge of such entities. This distinction, invisible on the concept orientation, is brought into the light by realism. And it is a distinction which will become increasingly important as automatic systems are called upon to process data in the clinical domain. It is the existence of universals which allows us to describe multiple particulars using one and the same general term and, thus, makes science possible. Science is concerned precisely with what is general in reality; it is interested, not in this or that macrophage, but in macrophages in general. It is the existence of such universals which makes diagnosis and treatment possible, by enabling uniform diagnostic and treatment methods (and associated clinical guidelines) to be applied to pluralities of patients encountered in different times and places. In what follows, we will show the advantages that a realist orientation has over the concept orientation in the creation and maintenance of terminologies as well as in other areas of knowledge representation. I.3. Concepts are Insufficient for All Areas of Knowledge Representation I.3.1. Some Arguments for the Concept Orientation and Realist Responses One argument in favor of conceptualism in knowledge representation is what we can call the argument from intellectual modesty, which asserts that it is not up to terminology developers to ascertain the truth of whatever theories the terminology is intended to mirror. This is the job of domain experts. Since domain experts themselves often disagree, a terminology should represent no claims as to what the world is like; instead, it should reflect a conglomeration formed out of the concepts used by different experts. In fact, however, scientists in medical fields (and other fields) accept a large and increasing body of consensus truths about the entities in these domains. Admittedly, many of these truths are of a trivial sort (that mammals have hearts, that organisms are made of cells), but it is precisely such truths which form the core of science-based ontologies. When there are conflicts between one theory or research community and another, these tend to be highly localized, pertaining to specific mechanisms; for example, of drug action or disease development. Furthermore, such areas of research can serve as loci of conflicting beliefs only because the researchers involved share a huge body of common presuppositions. 92 We can think of no scenario under which it would make sense to postulate special entities called concepts as the entities to which terms subject to scientific dispute would refer. Since for any such term, either the dispute is resolved in its favor, and then it is the corresponding entity in reality that has served as its referent all along; or it is established that the term in question does not designate anything at all, and the term will then, in the course of time, be dropped from the terminology altogether. The problem that arises from the fact that we do not know, at a given stage of scientific inquiry, whether or not a given term has a referent in reality, cannot be solved by providing such terms with guaranteed referents called concepts. Sometimes the argument from intellectual modesty takes an extreme form, as in the case of those who consider reality itself to be somehow unknowable (as in, 'we can only ever know our own concepts'). Arguments along these lines, of course, are familiar not only from the Wüsterian tradition, but also from the history of Western philosophy. Stove provides the definitive refutation (Franklin, 2002). Here we need note only that such arguments run counter not just to the successes, but to the very existence, of science and technology as collaborative endeavors. The second argument in favor of the concept orientation is what we might call the argument from creativity. Designer drugs, for example, are conceived, modeled, and described long before they are successfully synthesized, and the plans of pharmaceutical companies may contain putative references to the corresponding chemical universals long before there are instances in reality. But again, such descriptions and plans can be expressed perfectly within terminologies and ontologies conceived as representing only what is real. For descriptions and plans do, after all, exist. On the other hand, it would be an error to include in a scientific ontology of drugs terms referring to pharmaceutical products which do not yet (and may never) exist, solely on the basis of plans and descriptions. Rather, such terms should be included only at the point where the corresponding instances do, indeed, exist in reality. Third is what we might call the argument from unicorns. According to this argument, some of the terms needed in medical terminologies refer to what does not exist. After all, some patients do believe that they have three arms, or that they are being pursued by aliens. But the realist conception is also equipped to handle phenomena such as these. False beliefs and hallucinations are, of course, every bit as real as the patients who experience them. And certainly such beliefs and episodes may involve 93 concepts (in the proper, psychological sense of this term). But they are not about concepts, and they do not have concepts as their objects; for their subjects take them to be about entities in external reality instead. Believing in the concept of aliens in pursuit is not nearly as frightening as believing that there are actual aliens. These patients are making an error, whose proper explanation in our patient records does not consist in asserting that the patients in question, in fact, believed in merely the concept of aliens all along. Such an explanation cannot account for the anxious behavior associated with believing in aliens. Fourth is the argument from medical history. The history of medicine is a scientific pursuit; yet it has often used terms such as 'phlogiston' which do not refer to universals in reality. But the domain of the history of medicine is precisely constituted of the beliefs, both true and false, of former generations. Thus, it is expected that a term like 'phlogiston' should be included in the ontology of this discipline; not, however, as a freestanding term with a concept as its referent. Rather 'phlogiston' should occur as a constituent part of terms denoting the corresponding kinds of beliefs (Smith, 2005b). Fifth is the argument from syndromes. The biological and medical domains contain multitudes of entities which do not exist in reality, but which serve nonetheless as convenient abstractions. For example, a syndrome such as congestive heart failure is an abstraction used for the convenience of physicians for the purpose of collecting under one umbrella term certain disparate and unrelated diseases which have common manifestations or symptoms. Such abstractions are, it is held, mere concepts. From a realist perspective, however, syndromes, pathways, genetic networks, and similar phenomena are fully real, though their reality is that of defined (fiat) classes, rather than of universals. That is, they are real in the sense that they belong to real classes which have been defined by human beings for the very purpose of talking about things which we do not yet fully understand. We may say something similar about the many human-dependent expressions like 'obesity', 'hypertension', or 'abnormal curvature of spine'. These terms, too, refer to entities in reality, namely to defined classes which rest on what may be changing fiat thresholds established by consensus among physicians. Sixth is the argument from error. Logical conflicts can arise when falsehoods are entered into a clinical record and interpreted as being about real entities. Rector, et al.take this to imply that the use of a meta-language should be made compulsory for all statements in the electronic health 94 record (EHR). The terms in terminologies devised to link up with such EHRs would refer, not to diseases themselves, but rather merely to the concepts of diseases on the part of clinicians. Thus what is recorded should not be seen as pertaining to real entities at all, but rather to what are called findings (Rector, 1991). Instead of recording both p and not p, the record would contain entries like: McX observed p while O'W observed not p. Since these entries are about observations, logical contradictions are avoided. We do not, of course, dispute the fact that clinicians have a perfectly legitimate need to record findings such as an absent finger or an absent nipple. What is disputed, however, is Rector's inference from the fact that there might be falsehoods among the totality of assertions about a given clinical case (or scientific domain), to the conclusion that clinicians (or scientists) should cease to make assertions about the world and, rather, confine themselves to assertions about beliefs. This proposal contributes to a blurring of the distinction between entities in reality and associated findings. Information about beliefs is fundamentally different in nature from information about objects. Failing to make this explicit allows terminologies to include findings-related expressions in the same category as expressions which designate entities in reality as, for example, in the following assertions from SNOMED CT: 'Genus Mycoplasma (organism) is_a Prokaryote-cell wall absent (organism) is_a bacteria (organism)' and 'Human leukocyte antigen (HLA) antigen absent (substance) is_a Human leukocyte antigen (HLA) antigen (substance)'. This running together of two fundamentally different types of assertions introduces obstacles to the working of automatic reasoning systems that employ them as basis. Of course, we do not deny that clinicians face the need to record, not only the entities on the side of the patient, but also their own beliefs and observations about these entities. Indeed, Rector's argument for the move to conceiving the record as being a record of facts about beliefs rather than of facts about the world is importantly buttressed by appeal to legal considerations which require that the EHR provide an audit trail relating, precisely, to beliefs and actions on the side of medical practitioners. The EHR must serve forensic purposes. From the realist point of view, however, these forensic purposes can be served equally well by a record of facts about the world, as long as we ensure that (a) such facts include facts about beliefs and actions of practitioners (conceived as full-fledged denizens of reality), and (b) the record also preserves data about who 95 recorded those facts, at what time they were recorded, and so forth, as according to the strategy we outlined in Ceusters and Smith (2006). On behalf the realist orientation, it can be argued further that even the move to assertions about beliefs would not, in fact, solve the mentioned problems of error, logical contradiction, and legal liability. For the very same problems of inadequacy can arise, not only when human beings are describing fractures, pulse rates, coughing, or swellings, but also when they are describing what clinicians have heard, seen, thought, and done. In this respect, these two sets of descriptions are in the same boat, as each is a case of humans describing something. Hence, both are subject to error, fraud, and disagreement in interpretation. The alternative to the Rector approach, we believe, is to provide facilities with the ability to quarantine erroneous entries – and to resolve the concomitant logical conflicts – as they are identified; for example, by appealing to the resources provided by formal theories of belief revision as outlined in Gärdenfors (2003). The seventh, and final, argument for the concept orientation as a basis for biomedical terminology development is the argument from borderline cases. There is often, it is said, no clear border between those general terms which designate universals in reality and those which merely designate classes defined by human beings to serve some purpose. Certainly there are clear cases on either side; for example, 'electron' or 'cell', on the one hand, and 'fall on stairs or ladders in water transport NOS, occupant of small unpowered boat injured' (Read Codes), on the other. But there are also borderline cases such as 'alcoholic non-smoker with diabetes', or 'age dependent yeast cell size increase', which might seem to call into question the very basis of the distinction. We will respond, first, with the general point that arguments from the existence of borderline cases usually have very little force. Borderline cases do not undermine the distinction between the entities on either side. The grey area of twilight does not prevent us from distinguishing day from night. Likewise, we can distinguish the bald from the hairy even though we do not know exactly how many hairs one must lose to traverse the border. As to the specific problem of how to deal with borderline expressions of the sorts mentioned – expressions which seem to lie midway between designating universals and designating mere arbitrary classes – this is, in our view, a problem for empirical science, not for terminology. That is, we believe that the normal processes of scientific advance will bring it about that such borderline terms will undergo a filtering process. This process is based on whether they are needed for purposes of fruitful classifications 96 (for example, for the expression of scientific laws), or for purposes of arbitrary classification (for example, when describing eligible populations for trials). One generation of scientists may take a given term to refer to a universal, whereas the next generation may discover a reason to believe that the term does not designate anything at all (for example, 'caloric'), or recognize that it, in fact, refers ambiguously to several universals which must be carefully distinguished ('hepatitis'). Thus, representational artifacts such as information systems and textbooks, which form an integral part of the practice of science, must be continually updated in light of such advances. But again, we can think of no circumstance in which updating of the sort in question would signify that caloric is a concept, or that some expression, at one or other stage, was being used by scientists with the intention of referring to concepts rather than to entities in reality. I.3.2. Concepts are Ethereal The problematic features of common uses of the term 'concept' are not peculiar to the world of biomedical terminology; indeed, they arise generally in the knowledge-representation literature on semantic networks (for example, see Sowa, 1992) and conceptual models (Smith, 2006). Here again, concepts (variously called 'classes', 'entity types', 'object types', though information scientists will disagree as to whether the same thing is being expressed by all of these terms) are called upon to perform, at least, two conflicting roles. On the one hand, inside the computer they are delegated to represent concrete entities and the classes of such entities that exist in reality outside of the computer. For example, some abstract proxy – some ghostly diabetes counterpart – is required for this purpose, it is held, because one cannot get diabetes itself inside the computer. And the computer could reason about diabetes only by creating such a proxy (so the programmer supposes). On the other hand, concepts are delegated to playing the role of representing, in the computer, the knowledge in the minds of human experts. This knowledge is, then, itself characteristically (and again erroneously, as Putnam (1975) argues) assumed to be identifiable with the meanings of the terms such experts use and, in this way, the painful polysemy of 'concept' is inherited by the word 'knowledge' and its cognates. Because concepts are pressed into service to perform these various roles, they acquire certain ethereal qualities. Concepts, then, are triply ethereal, 97 existing in a different sort of denatured guise in the machine, the human mind, and among the meanings stored in language. Their ethereal nature implies that concepts are not the sort of thing that can be examined or inspected. We know what it means to raise and answer questions about, say, a case of diabetes, or about the disease diabetes itself. We can turn towards both of these things by directing our attentions to corresponding entities in the world; we can make what it is on the side of the patient the target of our mental acts (that to which these acts are directed). We can concern ourselves with traits of the disease or properties of the patient, and we can weigh the separate views advanced by different observers in light of the degree to which they do justice to these traits. But it seems that we can do none of these things in relation to entities in the realm of concepts. The pertinent literature in philosophy and psychology (Margolis, 1999) suggests that concepts are most properly understood, not as targets of our cognitive acts, but rather as their contents, as that which determines what the target should be and how, in a given act, it should be represented. If this is so, then our puzzlement in the face of questions as to the nature of concepts is understandable. The concept orientation rests precisely on the tacit assumption that concepts would serve as targets – indeed, as the primary targets of concern in work on terminologies – when, in fact, they serve as contents. I.3.3. The Realm of Concepts Does Not Exist A further illustration of the problems associated with the concept orientation is provided by Campbell (1998), in which Keith Campbell, Diane Oliver, Kent Spackman, and Edward Shortliffe – four distinguished figures in contemporary medical informatics – discuss the relevance of the Unified Medical Language System (UMLS, see National Library of Medicine) to current terminology work. The UMLS Metathesaurus is a well known resource which gathers terms from different source terminologies into a single compendium, with the goal of creating what it calls unified meaning across terminologies. By this its authors mean, roughly, that it creates a framework of common meanings which can be used to provide access to the plurality of meanings carried by terms in the Metathesaurus which derive from a plurality of source terminologies and, consequently, are associated with a plurality of definitions. The purpose is to ensure that everybody who encounters a medical term in a document can use the UMLS to find out the term's 98 possible meanings. Here, 'unifying' is understood as bringing under one framework. The problem is that the Metathesaurus attempts to do this by creating unified meanings even for those terms which, as they occur in the respective separate source terminologies, clearly have different extensions in the actual world. For example, it assigns the same concept unique identifier (CUI) to both 'aspirin' and 'Aspergum'. In other words, it treats these two terms as if they would refer to (or express) one and the same concept. Campbell's (1998) thesis is that this is allowed because there is a Possible World (the authors cite in this connection the work of Leibniz) in which 'aspirin' and 'Aspergum' do in fact refer to one and the same thing (p. 426). That is, the authors seem to be pointing out that there are situations in which aspirin and Aspergum can be ingested interchangeably. Of course, as the authors admit: Many clinicians would not regard different formulations of aspirin... as interchangeable concepts in the prescriptions they write. Although aspirin may be an abstract concept, Ecotrin and Aspergum have specific formulations (extensions) in our corporeal world, and use of those particular formulations is subject to different indications, mechanisms of therapy, and risks to the patient. Clearly then, in at least a pharmacy order-entry system, any extensional relationship that was used to determine allowable substitution of pharmacologic formulations would need to have different relationships (representing a different Possible World), than the one currently embodied within the UMLS. However, for a system primarily concerned with the active ingredients of a drug, such as an allergy or drug interaction application, the Possible World embodied in the UMLS may be optimal. (Campbell, 1998, p. 429) At this point, two questions arise. First, in what sense does the UMLS actually unify the meanings of terms? If it only unifies them for certain specific purposes – say, for example, the purposes of those concerned only with a drug's active ingredients – then it seems to be restricting terms' meanings, rather than unifying them. Second, in what sense is the world, thus defined, possible, given that it would have to be governed by laws of nature different from those in operation here on earth? The answer is that it is possible, at best, as an artifact, something artificial, inhabiting the same high-plasticity conceptual realm that is postulated by Wüster and his colleagues, a realm in which aspirin may be an abstract concept. In Campbell (1998), the UMLS is itself correspondingly referred to as an artificial world, as contrasted with our 99 corporeal world of flesh and blood entities. And the job of this artificial world is asserted to be that of providing 'a link between the realm in which we live and the symbolic world in which computer programs operate' (p. 426). Three worlds have hereby been distinguished: (1) the possible ('artificial') world which is the UMLS, (2) the 'symbolic world' in which computer programs operate, (3) the 'corporeal world' in which we live. How can world (1) link worlds (2) and (3) together? The answer, surely, must involve some appeal to the extensions of the concepts in the UMLS. Extensions are understood as collections of the individual objects (actual patients, actual pains in actual heads, actual pieces of Aspergum chewed) in the corporeal world. The authors themselves suggest a reading along these lines when they point out (p. 424), in regard to the terms existing in the UMLS source terminologies, that: [o]n the one hand there are the physical objects to which [an expression like 'aspirin'] refers (the expression's extensional component) and on the other there are the characteristic features of the physical object used to identify it (the expression's intensional component). When it comes to the UMLS itself, however, they abandon this traditional philosophical view in favor of a view according to which (if we have understood their formulations correctly) the extensions of the concepts in the UMLS would be sets of concepts drawn from source terminologies: the developers [of the UMLS] collected the language that others had codified into terminological systems, provided a framework where the intension (connotation) of terms of those systems could be preserved, and unified those systems [into one unified system] by providing a representation of extensional meaning by collecting abstract concepts into sets that can be interpreted to represent their extension (p. 425). They then assert that: [t]hese extensional sets are codified by the Concept Unique Identifier (CUI) in the UMLS. We argue that the 'meaning' of this identifier is only understandable extensionally, by examining the characteristics shared by all abstract concepts linked by a CUI (p. 426). 100 By interpreting 'extension' in Wüsterian fashion (which means conceiving extensions in abstraction from the corresponding instances in reality), our authors deny the possibility that the UMLS provides the desired link between the symbolic dimension of computer programs and the domain of real-world entities. In hindsight, we can see that, with their talk of the UMLS as building a bridge between computers and corporeal reality, Campbell, Oliver, Spackman, and Shortliffe have projected onto the UMLS a goal more ambitious than that which it was really intended to serve. Its actual goal was that of finding unified meaning across terminologies. This weaker goal has proved unrealizable, for the same reason that the concept orientation in general is unrealizable (though there may be some practical value in the imperfect realization even of the weaker goal of unified meaning; for example, in expanding the number of synonyms that can be used to find a target term in a specific terminology). We are still free, however, to readdress the more ambitious goal of building a bridge between computers and corporeal reality, a goal which, with the ineluctable expansion in the use of computers in clinical care (and especially in evidenced-based medicine), becomes ever more urgent. Part II: Bridging Computers and the World We have claimed that the concept orientation places severe limitations on terminologies to fulfill their potential to support computer applications, a task for which we have claimed that the realist orientation is better suited. In what follows, the reasons for this should become clear. II.1. How Terms are Introduced into the Language of Medicine Consider what happens when a new disorder first begins to make itself manifest. Slowly, through the official and unofficial cooperation of physicians, patients, public health authorities, and other involved parties, a view becomes established that a certain family of cases, manifesting a newly apparent constellation of symptoms, represents instances of a hitherto unrecognized kind. This kind is a part of reality and, as we have seen above, it corresponds to what realist philosophers call a universal. The problem is that, in many cases, it is difficult to grasp what universal given particulars are instances of. When a disease universal first begins to make itself manifest in a family of clinical cases, it will be barely 101 understood. Something similar applies when a new kind of virus or gene, or a new kind of biochemical reaction in the cell, is first detected. In such cases, a new term is needed to refer to the newly apparent kind. Eventually those involved come to an agreement to use, from here on, (1) this term for (2) these instances of (3) this kind. The concept orientation, however, postulates (4) a new concept, together with (5) a definition. II.2. Definitions On the original ISO-Wüster paradigm, a concept is given what Wüster calls an intensional definition, which is an attempt to describe a type of object by referring to characteristic features that its instances have in common. This account works well enough in the relatively straightforward area of woodworking equipment, where Wüster came up with his ideas on concepts and definitions. It works well, too, in a domain like chemistry, for many molecular structures can indeed be precisely and unproblematically defined in terms of exactly repeatable patterns. However, it confronts two problems in the domain of medicine. One problem occurs in cases where a new universal has only begun to make itself manifest, such as SARS, and it is not yet certain how it is instantiated. Another is that, even if a universal is fairly well understood, we may encounter many instances of it which do not have very many characteristics in common. For example, consider a particular butterfly which might be known to several people, but only at distinct phases of its development. A similar problem is faced when drawing together knowledge concerning successive phases in the development of what is not yet recognized as one single disease. While in regard to an individual case, users of the term may know precisely what they are referring to (they can point to it in the lab or clinic); nevertheless, it may be difficult to convey this information to others. In such cases, the user has a clear understanding of what the term designates in reality, but only at the level of instances and not yet at the level of universals. As in the case of SARS, or Legionnaires Disease, a term may be introduced as a provisional aid to communication even though the phenomenon has not yet been identified or clearly understood on the level of universals and, on the concept orientation, this means that a new concept is thereby introduced in tandem with this term. There are three strategies which terminologies often employ with respect to providing definitions for new, or problematic, concepts. One is to leave 102 them undefined, as in the terminology found in SNOMED CT (Bodenreider, 2004). This strategy is itself problematic, for the fewer defined terms a terminology contains, of course, the less value it provides to its users. The second strategy is for terminologies to fabricate definitions effectively by permuting the constituent words of the term in question. This occurs, for example, in the National Cancer Institute Thesaurus's definition of 'cancer death rates' as 'mortality due to cancer'. This practice does not define a term; rather, it merely offers a rewritten version of the term itself. This is akin to defining 'SARS' as meaning severe acute respiratory syndrome. This is unhelpful, because not every case of severe acute respiratory syndrome is in fact a case of SARS. The latter covers only those cases of the severe acute respiratory syndrome first identified in Guangdong, China in February 2002 and caused by instances of a certain particular coronavirus whose genome was first sequenced in Canada in April 2003. (www.cdc.gov/mmwr/preview/mmwrhtml/mm5217a5.htm) On the realist orientation, it is recognized that, when more is learned about the new kind that has been discovered, the meaning of the term used to designate that kind will change accordingly. The realist's goal is for a definition to track the development of our scientific knowledge about the world and, ultimately, to capture reality as it is in itself. In our present case, this means capturing that which all instances of a given disease share in common. A real definition provides necessary and sufficient conditions under which it is appropriate to use the term in question as, for example, in this definition taken from the Foundational Model of Anatomy Ontology: x is a cell =def. x is an anatomical structure which has as its boundary the external surface of a maximally connected plasma membrane. Such a definition describes the real-world conditions under which it is appropriate to use the corresponding term. For many medical terms, only some small number of necessary conditions has been identified thus far. In such cases, it is the job of the definition to describe a partial and still amendable view of what a term actually refers to according to current usage, to be amended as knowledge about it increases. II.3. Putting Realism to Work Realism sees each terminology as a work in progress reflecting the secure, yet fallible beliefs held at the pertinent stage in the development of 103 biomedicine about how particular entities in reality are to be classified as instances of universals. Ideally, the result of these works in progress is the increase in the total sum of true beliefs about universals as well as about particulars so that, for example, in biomedicine there is a broad accumulation of knowledge. It is this ideal which the Open Biomedical Ontologies (OBO) Foundry is currently attempting to realize in practice. Mixed in with such knowledge, however, there will be a small and everchanging admixture of false beliefs and confusions at every stage. Here, the part of this admixture which concerns us takes the form of terms in a terminology that are believed to refer to some corresponding universal, but which actually do not do so. This can be either because there is no universal at all which can serve as referent of the term in question, or because the term refers ambiguously to what is, in fact, a plurality of universals. With this in mind, we have developed realist counterparts of the three central Cimino desiderata: Each preferred term in a terminology must correspond to at least one universal (non-vagueness). Each term must correspond to no more than one universal (non-ambiguity). Each universal must itself correspond to no more than one term (nonredundancy). These desiderata are not realizable by any terminological adjustments that are motivated merely by considerations of meaning and language. Rather, they need to be accepted as long-term goals, to which terminologists will come ever closer but never completely realize. In moving towards their realization, terminologists must always follow on the coat-tails of those engaged in empirical research in the attempt to expand our body of knowledge of biomedical universals and their instantiations. II.3.1. Knowledge of Universals vs. Knowledge of Instances The realist proposal, here, amounts to turning the concept approach onto its head. Whereas the concept approach starts from the top down, letting our thoughts frame our beliefs about reality, the realist approach starts from the bottom up, with the goal of allowing reality itself to form our beliefs about its denizens in a direct way. Whereas the concept approach admits of only one type of knowledge 104 (knowledge, precisely, of concepts), the realist approach allows us to distinguish two types: knowledge of universals and knowledge of instances. Knowledge of universals is the sort of general knowledge that is recorded, for example, in the textbooks of biomedical science; it is knowledge about the types of entities (such as tuberculosis) that there are in the world. Knowledge of instances is the particular knowledge of specific, concrete things (such as this or that particular case of tuberculosis). We have already seen that it is general knowledge that terminologies are intended to capture, if they are to achieve their practical effect. The domain covered by each terminology comprehends a wide variety of different kinds or categories of universals. In the realm of disorders, these include symptoms, pathological, and non-pathological anatomical structures, acts of human beings (for example anesthetizings, observings), biological processes (disease pathways, processes of development and growth), and more. In contrast to what is the case in many areas of science, in the domain of clinical medicine knowledge of instances of such universals is of considerable value as well. It is such knowledge that is recorded in clinical records; for example, of patient visits, of emergency call centers, of laboratory results, and so forth. This sort of knowledge is also recorded in automated EHR systems, whose goal is to facilitate clinical data entry in such a way as to enable it to be both used by a human being, and interpreted by a computer application. The knowledge represented in EHRs is intimately related to the knowledge represented in terminologies. It is through increased discoveries about the sorts of particulars described by EHRs that we gain knowledge about the universals catalogued in clinical terminologies. Obtaining knowledge of a universal, in turn, puts us in a position to recognize particular instances when we come across them. In fact, both kinds of knowledge are indispensable, not only to clinical diagnosis, but to all forms of scientific research. The better our systems are for keeping track of particulars in the clinical domain, the more efficiently our knowledge of the universals in this domain will be able to advance. However, current EHR regimes embody certain impediments to this advance which, we believe, can be overcome with the help of the realist approach. 105 II.2.2. Realism and EHR Systems Most existing EHR systems allow direct reference only to two sorts of particulars in reality, namely, (i) human beings (patients, care-providers, family members), via proper names or via alphanumeric patient IDs, and (ii) times at which actions are performed or observations are made (Ceusters, 2005). This impoverished repertoire of types of direct reference to particulars means that no adequate means is available to keep track of instantiations of other types of universals (for example, a specific wound, or fracture, or tumor) over an extended period of time. When interpreting health record data, it is correspondingly difficult to distinguish clearly between multiple examples of the same particular, such as this tumor, and multiple particulars of the same general kind, such as any tumor existing in patient Brown (Ceusters, 2006). When a clinician needs to record information about some particular within different contexts – for example, as it exists at different points in time – he must create an entirely new record for each such reference. This is done via some combination of general terms (or associated codes) with designators for particular patients and times; for example, in expressions like, the fever of patient #1001 observed by physician #4001 at time #9001. Unfortunately, such composites, even where they are formulated by the same physician using the same general terms deriving from the same coding system, constitute barriers to reasoning about the corresponding entities in software systems, above all because it cannot be unproblematically inferred when such an expression refers to the same entity as does some other, similarly constituted expression. (Imagine a regime for reasoning about human beings as they change and develop over time in which people could be referred to only by means of expressions like, 'patient in third bed from left', or 'person discharged after appendectomy', or 'relative of probable smoker') These sorts of limitations to the knowledge-gathering potential of current EHRs place obstacles in the way of our drawing inferences – for example, for scientific research or public health purposes – from our knowledge of different instances of the same clinical universal in different patients (Ceusters, 2006). Hence, a way to make the corresponding instances directly visible to reasoning systems is needed (which means visible without need for prior processing). We need to create a regime in which every real-world entity that becomes relevant to the treatment of a patient is explicitly recorded in 106 the course of data entry. The first step is to expand the repertoire of universals recognized by EHR systems in such a way as to include, in addition to patient and time, a wide array of other diagnostically salient categories such as disorder, symptom, pharmaceutical substance, event (for example an accident in which the patient was involved), image, observation, drug interaction, and so on. When this is done, each entity that is relevant to the diagnostic process in a given case should be assigned an explicit alphanumerical ID – what we have elsewhere called an instance unique identifier (IUI) (Ceusters, 2006) – that is analogous to a proper name. This would allow EHR systems to do justice to what it is on the side of the patient, in all its richness and complexity. It would also provide an easy means of doing justice to the different views of one and the same instance of a given disorder that may become incorporated into the record; for example, when physician A writes 'tumor' and physician B writes 'CAAA12'. The use of IUIs allows us to map the corresponding particulars in our computer representations with one another in a way which serves to make it clear when different physicians are referring to one and the same particular. Indeed, the cumulative result of such use can be understood as a map of the domain in question, showing the multifarious ways in which the universals in the domain relate to one another. In the next chapter, we will see how such maps can be put to use for the purpose of increasing our scientific knowledge of universals and instances alike. Conclusion The original motivation for the concept orientation was that it provides a means of representing information which is immune to the vagaries of thought as expressed in natural language. We have shown that the concept orientation, even when Cimino's desiderata are realized, is beset with flaws which hamper our ability to use terminologies and electronic health records to their full potential. We have advocated a realist orientation, which enables us to bypass the postulation of a conceptual realm and, instead, to engage in the creation of ever-more detailed maps of that reality which, in science and in clinical care, should always be our primary focus.