What do Identifiers in HL7 Identify? An Essay in the Ontology of Identity Werner Ceusters and Barry Smith from Proceedings of InterOntology 2009 (Keio University Press, Tokyo, Japan, February 27-March 1, 2009), 77-86. ABSTRACT. Health Level 7 (HL7) is an organization seeking to provide universal standards for the exchange of healthcare information. In a document entitled 'HL7 Version 3 Standard: Data Types', the HL7 organization advances descriptions of data types recommended for use as identifiers. We will argue that the descriptions supplied provide insufficient guidance as to what exactly the entities are which these data types uniquely identify. Are they real things, such as persons or pieces of equipment? Or are they representations of such real things in information artifacts? We here outline the problems faced by HL7 in providing answers to such questions, problems which arise because of the lack of anything like a coherent ontology in the HL7 standard, and we make some recommendations for future improvements. 1 Introduction The mission of the HL7 organization is 'to provide standards for the exchange, management and integration of data that supports clinical patient care and the management, delivery and evaluation of healthcare services'.1 The key idea underlying its work is that it is possible to create a standard for the formulation of messages transmitted between distinct health information systems that would render these systems semantically interoperable, in the sense that messages would not merely be transmitted from one system to another, but also understood in the same way by human senders and receivers. In order to realize this goal of semantic interoperability it is thus essential that communicating systems convey information in such a way that humans beings – both in formulating and in interpreting messages – can understand what these messages are about. We believe further that the software engineers charged with developing such communicating systems, as well as the HL7 consultants and domain experts that assist them in this task, need not only to understand what HL7 messages are about, but also to understand the HL7 specifications themselves, and, more specifically, the Reference Information Model (RIM), which forms the backbone of HL7's current Version 3. For HL7 to give a coherent account of identifiers that is in accordance with its goals, therefore, it is necessary for the RIM to embody a coherent account of the types of entities identified, or in other words a coherent ontology, and this HL7 does not have. In 2006, we argued that the RIM and its associated documentation in the versions then available were dramatically inconsistent and we provided some recommendations for improvement.2 The rebuttal that followed did something to explain how the problems of inconsistency had arisen (basically because HL7 is a large volunteer organization, with Keio University Press Inc. 2009 2 Werner Ceusters and Barry Smith. no capacity to coordinate contributions coming in from different sources). It did not, however, provide any arguments against our specific claims.3 In August 2007, the HL7 Board of Directors was reported to have 'identified the need to review the existing V3 portfolio of standards for clarity, consistency, and ease of use'.4 Yet we still, in the normative version of the HL7 RIM (V02-18 of 9/10/2007, January 2008 Ballot Package Preview), find a repetition of the same oft-quoted statement – a statement which in our opinion undermines the entire RIM endeavor, to the effect that 'there is no distinction between an activity and its documentation. Every Act includes both to varying degrees.' (Emphasis added.) There is of course an obvious internal inconsistency between the last two sentences: if there is no distinction between X and Y, thus if X=Y, then what does it mean to say that something else includes them both? But more importantly, it surely cannot be, or so we hope, that the author of these statements – statements which have been re-approved in successive RIM releases – sincerely believes that there is no distinction at all between, say, the removal of my appendix and a report of such removal. Perhaps what is meant is that if an HL7 expert were to apply all rules and principles offered by the HL7 machinery in a correct way in order to represent the removal of my appendix, then the result would be exactly the same as if he had carried out the same exercise with the goal of producing an HL7-conformant representation of some documentation describing that removal. But if that is the case, how could we then know what such a representation is really about, and how could we uniquely identify the process of removal? How could we know that different reports (data elements, records, messages, and so forth) are about the same event in reality if there is nothing in the HL7 machinery that distinguishes real world items from their documentation? The response of the HL7 expert might be: "an HL7 message is always about a report, or more broadly about information in general". The RIM is, after all, a reference information model. But why care about entities in reality at all, he may ask? Why would we ever need for healthcare message purposes anything other than a simple representation of documents alone? To see how this distinction might matter in patient care, consider the problems we will face on the approach described when we need our information system to keep track of a patient and of the records of the care this patient has received in a succession of different hospitals over time. We may wish our system to have the capability for example to detect inconsistencies between the different assertions expressed in these different documents, and to represent not only the patient and her disorders, diagnoses, drug interactions, etc., but also how the latter and the interrelationships between them, change over time.5 To support such capabilities, HL7 has introduced into its coding scheme a special attribute called '.id '. The objective of the work reported here is to test the hypothesis that an appropriate treatment of the distinctions between record and entity can be brought about by an adequate use of this attribute. Interdisciplinary Ontology Conference 2009 3 2. Material and methods We studied several versions of the HL7 Data Type specifications and other relevant documentation up to and including the version included in the January 2008 Ballot 'Preview' distribution as available on March 7, 2008. We performed searches on various public HL7 forums for discussions on identification issues, and we participated in (and in some cases initiated) such discussions in order to be corrected on possible misinterpretations of relevant HL7 texts. We also tried to locate examples of V3messages in order to study how instance identifiers are interpreted and used. 3. Results We can distinguish, on the level of general ontological categories, the following alternative candidates for entities which given instance identifiers might be used to identify:  information artifacts such as database entries or documents  the contents of information artifacts, such as individual diagnoses or measurement results  individual processes of creating or amending such artifacts  individual objects, such as patients or buildings  individual processes, such as acts of observation or surgical operations  the types which such individuals instantiate. Our conclusion is that, on the basis of our examination of the HL7 documentation (see Table 1) and of the opinions expressed by HL7 experts, it is to date still not possible to obtain a straightforward answer in the terms of such a list to the question: what is it that HL7 instance identifiers identify? This is because it is possible to point to aspects of HL7 documentation and practice which lend support to various more or less coherently specified alternatives among these choices. The general incoherence which we already reported2 as concerns interpretation of the RIM's 'backbone' categories of Act, Entity, and Role are thus inherited also by HL7's resources for identifying the entities about which messages are formulated. This issue is pertinent as, in the absence of a coherent system for unambiguous identification, it is for example impossible to obtain reliable counts for the numbers of treatment episodes in which given patients participate, or of the disorders they were intended to treat. It becomes impossible to estimate how many diagnoses have been Keio University Press Inc. 2009 4 Werner Ceusters and Barry Smith. formulated for these disorders, how many investigations were carried out in order to arrive at, or to falsify, such diagnoses, and so forth. Such counts are determiners for quality and cost-effectiveness of care, and they will become increasingly important for the healthcare of the future, in which outcomes-based quality assurance will play a central role. 4. The id. Attribute A first difficulty for HL7 turns on its different treatment of the '.id ' attribute for the classes Entity, Act and Role. When used as attribute of Entity and Act, this results in .idexpressions which denote some corresponding Entity or Act, respectively (which one precisely, as we will see below, still needs to be determined). But this is not so for Role. For in the latter case, the resultant identifier expression will denote the Entity playing the Role, thus leaving no room for the unique identification of the specific Role played by this Entity itself. A second difficulty is created by the distinct ways in which Act and Entity are defined: the former as a record of an act, the latter is the physical thing itself. Table 1. HL7 definitions drawn from RIM, HL7 Glossary, or Data Types (DT) documents 1. Act (RIM): A record of something that is being done, has been done, can be done, or is intended or requested to be done. 2. Act.id (RIM): A unique identifier for the Act. 3. Procedure (RIM): An Act whose immediate and primary outcome (post-condition) is the alteration of the physical condition of the subject. 4. Observation (RIM): An act that is intended to result in new information about a subject. Observations have a value attribute. The code attribute of Observation and the value attribute of Observation must be considered in combination to determine the semantics of the observation. 5. Observation.Value (RIM): Information that is assigned or determined by the observation action. 6. Entity (RIM): A physical thing, group of physical things or an organization capable of participating in Acts, while in a role. 7. Entity.id (RIM): A unique identifier for the Entity 8. LivingSubject (RIM): A subtype of Entity representing an organism or complex animal, alive or not. 9. Person (RIM): A Living Subject representing single human being [sic] who is uniquely identifiable through one or more Interdisciplinary Ontology Conference 2009 5 legal documents 10. Role (RIM): A competency of the Entity playing the Role as identified, defined, guaranteed, or acknowledged by the Entity that scopes the Role. 11. Role.id (RIM): A unique identifier for the player Entity in this Role. 12. Instance Identifier (DT): An identifier that uniquely identifies a thing or object. Examples are object identifier for HL7 RIM objects, medical record number, order id, service catalog item id, Vehicle Identification Number (VIN), etc. Instance identifiers are defined based on ISO object identifiers. From 6 and 7, we can conclude that the Entity.id attribute associated with a specific instance of Entity may denote a real, physical person. On the other hand from 8 and 9 we would have to conclude that the Person that is a subtype of Entity is not an 'organism or [sic] complex animal' at all, but rather merely something – thus presumably some information artifact – that represents an organism. From 1 and 2 we may infer that Act instance identifiers denote not physical activities, but rather records of such activities. We are then, however, left with an absurdity, when combining 4, the RIM definition for Observation. For on the one hand Observations are 'intended to result in new information about a subject', but on the other hand observations are Acts, and thus records. Records, however, cannot of themselves result in new information. A third difficulty pertains to a conflict between the RIM specification and the HL7 datatype specifications (DT) with which the RIM is asserted to be compliant. For the examples of instance identifiers given by DT, for example VIN numbers, identify, not records or representations, but rather physical objects themselves (or, in the case of catalog IDs, kinds of physical objects). A fourth difficulty lies in the multitude of ways in which specific values for HL7 class attributes affect the interpretation that has to be given to the correspondingly modified parts of HL7 messages. Where an instance of Person with determinerCode 'INSTANCE' is said to denote one specific person, matters are fairly clear. If this attribute is set to 'KIND', however, then we are left with conflicting interpretations:  in the RIM Vocabulary Domain we are told that use of the KIND attribute signifies that: 'the given Entity is taken as a general description of a kind of thing',  in the RIM documentation we are provided with an example: the population of Indianapolis, which is not a kind of thing but rather a particular collection of things. (Which collection precisely is still left to our imagination, for based on the definition of LivingSubject which generalizes Person to include dead persons, it would seem to include all persons who ever lived in Indianapolis. Keio University Press Inc. 2009 6 Werner Ceusters and Barry Smith. A final difficulty can be summarized as follows: are HL7 experts with the goal of applying HL7 'correctly' able to achieve this goal on the basis of the conflicting instructions provided by HL7 itself? To address this question we will just note the uncertainties which arise for the coder who needs to capture the distinction between (1) the 'act of observing', and (2) the 'finding' to which this observation leads: what referent is denoted by Act.id in such a case, and how do Observation.value and Observation.code relate to this referent? As we demonstrated above, HL7's definitions of the salient terms cannot help us here, because they confuse (or deny) the very distinctions they would need to clarify. The HL7 V3 example message6 used in the Ballot package mentioned above serves only to entrench the confusion further, since it seeks to have it both ways in one and the same message. Thus it codes one ObservationEvent by means of the SNOMED CT code for 'protein measurement' (classified by SNOMED as a procedure, thus as a physical process), while for another it uses the SNOMED CT code for 'finding of organism growth' (classified by SNOMED as a finding, thus as the result of a procedure). It is stated by HL7 that 'the RIM is not intended to represent a particular set of HL7 messages, but rather the collective universe of data and relationships from which any relevant HL7 message could be constructed'.1 We believe that to achieve this end, it must be possible to distinguish in explicit fashion at least between: (1) the physical process of measuring (for instance measuring my weight, here and now) (2) the entity that is measured (my weight, something that changes over time) (3) the magnitude that is measured (my objective weight at some given instant), (4) the obtained value for this magnitude, which may embody some error, (5) the act of registering this value in a record, (6) the resultant data-element in the record.7 Sadly, the HL7 RIM does not allow these distinctions to be made coherently. 5. Discussion On the one hand is the realm of information. On the other hand there is the world of what this information is about. We believe that many of the shortfalls we have identified in HL7-related work arise from its reliance on the so-called semantic or semiotic triangle,8 and on its distinction between words, things, and what are called 'concepts'. By awarding to concepts in this way, the paradigm gives rise to an erroneous expectation on behalf of those who adopt it that there is some welldemarcated realm of concepts, in addition to the two rather more familiar realms of Interdisciplinary Ontology Conference 2009 7 words and things. In fact, however, there is no realm of concepts, and thus no way in which distinct uses of the term 'concept' could be compared for correctness. Confusions accordingly arise because of the multiple inconsistent ways in which this term is used.9 , 10 The uncertainties as to the meaning of 'concept' generate further problems when HL7 is brought into relationship with third-party terminologies such as SNOMED CT. Such terminologies are themselves largely concept-based, and they, too, are typically still unclear about how precisely the term 'concept' as they use it is to be defined. The task of combining them with HL7 is then analogous to the task of combining the plumbing in the front and rear sections of a large hospital in the construction of which not only different specifications have been used, but also the language in which these specifications are written is understood in multiple different ways by those involved. Matters are made still worse by the fact that the UML-based modeling language in which the HL7 RIM is expressed imposes a view according to which everything has to be seen through the lens of an information system so that there is no realm for the objects which the information system is about – information and objects are effectively identified. To overcome these limitations, we suggest that discussions of the necessary radical revisions of the HL7 RIM should employ a terminological framework that is capable of capturing in an explicit fashion the views of all of those involved, whether they be realists (such as ourselves), who hold that there is an objective (though of course never fully known and understood) external reality, in which persons really exist, and sicken, and die; or semioticians for whom the concept 'person' corresponds to some 'specific mass (atoms, molecules, etc.) in the world',8 or conceptual modelers, who see everything through the lens of a UML diagram.8 To enable coherent communication between such parties, we believe that it is indispensable to draw a distinction11 between the four levels of L1: objects in reality L2: the beliefs and thoughts in people's minds about such objects L3: the expressions of such thoughts in public language. L4: the information artifacts used to identify entities on different levels. 5.1 L1-Entities: The Targets of Referential Acts It seems to us to be obvious that Electronic Healthcare Records (EHRs), like paperbased healthcare records, contain descriptions which are about (or in other words: denote or are used to refer to) particular entities in reality. Some of these entities are persons: the patient about which a specific EHR is maintained, his relatives, the doctors and nurses with whom he came into contact, and so forth. Others are instruments: blood Keio University Press Inc. 2009 8 Werner Ceusters and Barry Smith. pressure cuffs, reflex hammers, Coulter counters, weight scales, and so forth, that are used in investigations conducted by care providers to obtain a better grasp of what might be wrong with given patients. Others are disorders, for example a palpable intraabdominal mass inside a patient. Such entities are known to us in various ways, whether by direct observation, or by more complex processes of investigation, for example processes of measuring a baby's weight, of testing knee reflexes, of observing the gait disturbances of a Parkinson patient. All of the above – whether they be patients, or disorders, or processes of investigation – are entities in their own right. Such entities belong to what, from an ontological perspective based on realism, we shall refer to as 'level 1', the level of real-world entities. We shall refer to them henceforth as 'L1entities' in order to distinguish them from other types of entities that we will introduce below, and to avoid confusion with the putative referents of HL7 terms such as 'Entity', 'Act', and 'Observation'. L1 entities are the potential targets of referential acts. The reader should accordingly bear in mind that, since all entities fall under this heading, then entities on all of the distinguished levels will be L1 entities also. 5.2 L2-Entities: Cognitive Artifacts Whereas the entities discussed so far are fairly concrete – they can be touched, seen, filmed with a video camera, and so forth – other EHR descriptions are about more abstract entities such as beliefs on the side of the patient or of the physician. A hypothesis, for example to the effect that a given palpable mass inside a patient is malignant or benign, is on the level 2 of beliefs. We will refer to entities such as beliefs, opinions, and in general any entity that belongs to what is generated in a cognitive being's mind, including what are commonly called 'cognitive representations', as 'L2entities'. A specific L2-entity can stand in an aboutness relationship to an L1-entity, as when my belief in the benign character of the pimple that appeared this morning on my nose stands in an aboutness relation to that specific pimple. For this L2-entity to exist, and for it to stand in the aboutness relationship that it enjoys with that L1-entity, there must have occurred something like an observation of the L1-entity, whether directly or through other phenomena such as its sequelae, including reports such as this communication. Bear in mind that, at any point in time, L1-entities such as pimples and noses are what they are (or relate to other L1-entities in precise ways) independently of what a cognitive being believes or knows about them. Bear in mind also that L2-entities have certain physical counterparts, perhaps in the form of electrophysiological excitations or biochemical states in the brain of the relevant cognitive being, and that the latter are L1entities in their own right. In the eyes of some – the mind-brain identity theory – L2 entities are indeed identical with these L1 physical counterparts. But L2-entities should never be confused with the L1-entities that they are about, nor with the L1-entities that precede them and participate in their etiology. Interdisciplinary Ontology Conference 2009 9 Our beliefs belong to level 2, the objects which such beliefs are about to Level 1. Level 1 includes also immense quantities of further objects about which no beliefs will ever be formulated. The (correct) belief that human beings can know, at best, only small fragments of level 1 reality, should not, however, lead to the conclusion that there are no L1-entities or that L2-entities are not about L1-entities. It is a fallacy to infer from the proposition that knowledge is partial, and that any given belief may be mistaken, to the conclusion that every belief may be mistaken (indeed fundamentally, radically, metaphysically mistaken). This remains a fallacy when confined to the specific domain of the EHR. 5.3 L3-Entities: Data and Information We reserve the term 'L3-entity' for those components of linguistic or graphical structures (characters, strings, schemas) that the authors of these structures have selected in order to convey to others different kinds of mental contents. L3-entities are used for communication amongst cognitive beings. We find examples of each in several forms, in both paper-based and electronic health records as well as in other systems such as terminologies and ontologies. More closely related to the main topic of this paper, we find them also in HL7 specifications and messages. L3-entities as defined here are a special sort of L1-entity with which specific communicative and descriptive functions are associated. Where any individual character and character string, whether on a sheet of paper or as a rendering on a computer screen, is an L1-entity, it is an L3-entity if and only if it is intended to be about some other entity. This intended aboutness must have been assigned by the cognitive being who assigned these functions to the entity in such a way that it is understandable to other cognitive beings on the basis of the conventions in use in the relevant community of cognitive beings. A specific form of aboutness is denotation: some linguistic (or graphical) construct is used to refer to some L1-entity. An example is the 'c' in 'e = mc2', the 'c' which was just (say, 2 milliseconds ago) in front of the reader's eyes and which denotes the speed of light, in contrast to the 'C' in 'Charles N. Mead' (as it appears here in front of your eyes). The latter is not an L3-entity, since it denotes nothing at all, and even the string 'Charles N. Mead' itself is only an L3-entity when it is used with the intention to refer, for example to that particular entity, well known in the HL7 community, as Charles N. Mead. 5.4 L4-Entities: Identifiers in Reality It is to avoid ambiguities of the sort just sketched that identifiers, and more precisely unique identifiers, are introduced with the intention of denoting specific entities – Keio University Press Inc. 2009 10 Werner Ceusters and Barry Smith. typically L1-entities – unambiguously. Examples are the Employer Identification Numbers (EINs) introduced by the IRS to uniquely identify business entities in the United States, and the Vehicle Identification Numbers (VIN) that uniquely identify cars throughout the world. Other identifiers are the serial numbers of pieces of medical equipment, social security numbers, driving license numbers, bank account numbers, and so forth. Under the realism-based view that we adhere to, such identifiers are entities in their own right – we shall refer to them henceforth as 'L4-entities'. They are, by virtue of their dependence on human acts in some ways comparable to L2-entities; but they are nevertheless entities of a distinct type. L2-entities are in a sense solipsistic; where any particular L2 entity depends on a specific cognitive being (our respective beliefs about the beauty of Barbra Streisand's nose cannot be identical to any belief about her nose on the part of any other person), L4-entities, in contrast, can be shared by multiple cognitive beings. Like laws and constitutions they belong to the realm of social entities agreed upon by a community, and are in this respect similar to names. Note that neither identifiers nor names are L3-entities; but they can be expressed by means of L3entities. Some particular person's social security number (SSN) might be expressed as '057-879-6096' and also as '057-879-6096'. There are then two L3-entities – one in italics and one in bold face – that express one SSN. The SSN itself is the L4-entity and as such is not 'present in' this paper, or in that tax form, or in this other legal document, but rather 'expressed', 'concretized', 'realized', through distinct L3-entities in all of these various different ways. 6. Conclusion Clearly, one cannot hold HL7 responsible for mistakes committed because of mistaken interpretations of its own documentation – except where these mistakes arise because the latter is in so many fundamental respects open to multiple conflicting interpretations. Even HL7 experts struggle with questions such as "Aren't two acts with the same id one and the same?"12 Unfortunately for HL7, the problems here do not seem to be just a matter of documentation. Indeed we find in the RIM the class EntityHeir, which is defined as 'a subtype of Entity defined solely as a work-around for the lack of support of the reflexive closure of generalization relationships (i.e. "Entity is-an Entity") by the current set of tools'. In the rationale for the introduction of this class we are told that its use 'is entirely dictated by the (excusable) shortcomings of certain tools and data-structures used in the HL7 methodology, and have [sic] no conceptual meaning'. If the tools and data-structures used by HL7 are, as is here admitted, problematic; and if, as the reader can easily establish for himself by inspection, the result is documentation which is unintelligible except to some few members of the very small core of individuals responsible for creating and maintaining its various fixes from day to day, then why continue to use them? Why not take Interdisciplinary Ontology Conference 2009 11 advantage of more realistic paradigms, which would be not only more simple and more reliable, but also easier to document, to understand, and to apply in practice? References 1 Health Level Seven®, Inc. HL7® Version 3 Standard. http://www.hl7.org/v3ballot2008jan/html/welcome/downloads/downloads.htm#ballot_download. Last accessed: 11 December 2008. 2 Smith B, Ceusters W. HL7 RIM: An Incoherent Standard, Stud Health Technol Inform. 2006;124:133-138. 3 Schadow G, Mead CN, Walker DM. The HL7 Reference Information Model Under Scrutiny. Stud Health Technol Inform. 2006;124:151 – 156. 4 Van Hentenryck K. Communication from HL7 Board Chair regarding V3 Technical Editing Project. http://lists.hl7.org/read/messages?id=113265. Last accessed 28 December 2008. 5 Ceusters W, Smith B. Strategies for Referent Tracking in Electronic Health Records. J Biomed Inform. 2006 Jun;39(3):362-78. 6 http://www.nprogram.co.uk/examples/csrender/lab/renderedXMLSource.htm. Last accessed: 14 December 2008. 7 Ceusters W. and Russler D. The weight of the baby. http://hl7-watch.blogspot.com. Last accessed: March 12, 2008. 8 Russler D. Questions on HL7/RIM. http://esw.w3.org/topic/HCLS/ClinicalObservationsInteroperability/RIMRDFOWL/RIMQuestions. Last accessed: 11 December 2008. 9 Smith B., Ceusters W, Temmerman R. Wüsteria. In: Engelbrecht R. et al. (eds.) Medical Informatics Europe, IOS Press, Amsterdam, 2005;:647-652 10 Smith B. Beyond concepts, or: Ontology as reality representation, Formal Ontology and Information Systems (FOIS) 2004, 73-84. 11 Smith B, Kusnierczyk W, Schober D, Ceusters W. Towards a Reference Terminology for Ontology Research and Development in the Biomedical Domain. Proceedings of KRMED 2006, November 8, 2006, Baltimore MD, USA. 12 http://lists.hl7.org/read/attachment/95561/1/htmlversi on.html. Last accessed: 12 December 2008.