The Significance of SNODENT Louis J Goldberga, Werner Ceustersb, John Eisnerc, Barry Smithd a Department of Oral Diagnostic Sciences, School of Dental Medicine, University at Buffalo bEuropean Center for Ontological Research Saarland University, Germany cDepartment of Pediatric and Community Dentistry, School of Dental Medicine, University at Buffalo d Department of Philosophy, University at Buffalo and Institute for Formal Ontology, Saarland University, Germany Abstracts SNODENT is a dental diagnostic vocabulary incompletely integrated in SNOMED-CT. Nevertheless, SNODENT could become the de facto standard for dental diagnostic coding. SNODENT's manageable size, the fact that it is administratively self-contained, and relates to a well-understood domain provides valuable opportunities to formulate and test, in controlled experiments, a series of hypothesis concerning diagnostic systems. Of particular interest are questions related to establishing appropriate quality assurance methods for its optimal level of detail in content, its ontological structure, its construction and maintenance. This paper builds on previous software-based methodologies designed to assess the quality of SNOMED-CT. When applied to SNODENT several deficiencies were uncovered. 9.52% of SNODENT terms point to concepts in SNOMED-CT that have some problem. 18.53% of SNODENT terms point to SNOMED-CT concepts do not have, in SNOMED, the term used by SNODENT. Other findings include the absence of a clear specification of the exact relationship between a term and a termcode in SNODENT and the improper assignment of the same termcode to terms with significantly different meanings. An analysis of the way in which SNODENT is structurally integrated into SNOMED resulted in the generation of 1081 new termcodes reflecting entities not present in the SNOMED tables but required by SNOMED's own description logic based classification principles. Our results show that SNODENT requires considerable enhancements in content, quality of coding, quality of ontological structure and the manner in which it is integrated and aligned with SNOMED. We believe that methods for the analysis of the quality of diagnostic coding systems must be developed and employed if such systems are to be used effectively in both clinical practice and clinical research. Keywords: Diagnostic coding systems; SNODENT; SNOMED; Quality Assurance; Ontology 1. Background As a means of providing free access to a reputable clinical coding system, the US Department of Health and Human Services purchased rights to SNOMED Clinical Terms (hereafter: SNOMED-CT) from the College of American Pathologists in the summer of 2003. The first release of SNOMED-CT included 375,000+ concepts, 957,000+ descriptions or synonyms and 1,370,000+ relationships. Embedded in this January 2004 release was a 6,000+ term dental diagnostic vocabulary, known within the dental community as SNODENT. It was designed as a diagnostic companion to the Current Dental Terminology (CDT) treatment codes of the American Dental Association. Of these 6000+ terms approximately 1600 were Connecting Medical Informatics and Bio-Informatics R. Engelbrecht et al. (Eds.) IOS Press, 2005 © 2005 EFMI – European Federation for Medical Informatics. All rights reserved. 737 contributed by the American Dental Association while the remainder were dental terms already contained within SNOMED. This is an example in which in which two separate groups (CAP and the ADA) designed two distinct diagnostic coding systems for one specific domain (dentistry). As far as we know there was no significant communication between the groups and no attempts to evaluate the quality of either diagnostic coding system. It should, therefore, be no surprise that SNODENT is imperfectly integrated into the SNOMED-CT environment. Nonetheless, in the absence of established alternatives, SNODENT could well become the de facto standard for dental diagnostic coding. Its imminent release thus provides a unique dual research opportunity of potentially high significance. SNODENT's manageable size, the fact that it is administratively self-contained, and relates to a well-understood domain suggest that it might well provide valuable opportunities to formulate and test, in controlled experiments, a series of hypotheses concerning clinical coding systems. Medical coding systems need to have a maximal degree of permanence in order to facilitate the cost-efficient training of coders and to avoid over-frequent updates. Yet at the same time they need to have the flexibility to take account of changes in medical knowledge while dealing with the nuances of actual cases encountered at the point of care. While almost all medical coding systems have been designed and subsequently maintained with content as primary focus, it has more recently been shown that, even in addition to the use of DL-style formal definitions, adherence to good coding principles [1-3] can bring significant advantages. It is generally recognized further that errors in medical coding can have important consequences for the quality of health care. Moreover the increasing alignment of terminologies and ontologies from different parts of biomedicine, and the increasing use of coding systems in the annotation of gene product and other data, means that the correction of code-based errors has an increasingly high cost, since errors in one coding system propagate through all the many other data resources into which information expressed using the relevant codes has been absorbed or annotated. Clearly, these factors point in competing directions [4], and they make it clear that a methodology for quality assurance of clinical coding systems cannot focus narrowly on the elimination of errors in the formulation of content and on the creation of structural architectures which enjoy a high degree of elegance but bring no further benefits in efficiency or reliability of coding. Rather, as [5] points out, theoretically inspired moves towards fine-tuned representations of reality should be accompanied by empirical demonstrations of usefulness. In our view, the assessment and enhancement of SNODENT should focus on repeated statistical measures comparing the effectiveness in a real-world coding environments of the original (baseline) SNODENT codes with a series of controlled enhancements. No comprehensive methodologies for the quality assurance of medical coding systems have thus far been proposed, and we believe that such an analysis can break new ground not merely in testing methodologies for improving code sets in controlled experiments but also in measuring the costs and benefits brought about by such improvements. The enhancements we are working on relate to three levels: encompassing (i) broadening dental content, (ii) adding Description Logic architecture, and (iii) improving structural/ontological organization. We concentrate here on level (iii) – the alignment of SNODENT within SNOMED. 2. Methods and Results We have been working for two years on testing a variety of software-based methodologies for the management and quality assurance of medical terminologies. The methodology L.J. Goldberg et al. / The Significance of SNODENT738 described in [6] which was first to applied to the January 2003 and July 2003 versions of SNOMED-CT, was also used to analyze SNODENT using SNOMED as the gold standard. The basic idea is that it is possible to search for errors in a terminology by comparing different ways in which information can be extracted from its terms, concepts, descriptions, and definitions. To that end, the inter-concept relationships that are expressed in a terminology are first organised in a graph. This graph is then expanded in two ways. First, it is overlaid by a graph that represents the lexical relationships between all the terms in the system. Second, concepts that are formally described, but not defined (and hence are declared to be primitive in the original version of the terminology) are given a defined version, which then subsumes the undefined version. After reclassification, more concepts might be found to be subsumed by the defined version. Error discovery is then reduced to a process that seeks unexplainable differences in the semantic distances between concepts computed over the lexical graph as compared to the original concept graph, and a search for idiosyncratic patterns in the expanded concept graph. 2.1. Problems in the Calibration of SNODENT and SNOMED Our initial findings related to calibration include: 618 (9.52%) of SNODENT terms, involving 208 (5.38%) termcodes, point to concepts in SNOMED-CT that have a "watch out" status, distributed as follows: retired 86, duplicate 15, ambiguous 517. 1203 (18.53%) SNODENT terms point to SNOMED-CT concepts that are labelled as active, but that do not have (in SNOMED) the term used by SNODENT. This means that if SNOMED-CT would be taken as the gold standard to compare term usage within SNODENT, 18.53% of the SNODENT terms must be considered to be inappropriate. If SNODENT would be taken as gold standard, SNOMED-CT would lack 18.53% of the accepted terms in SNODENT. 368 (5.67%) SNODENT terms are not found in SNOMED, although the corresponding concept does exist in both systems (and with the same code). The extra terms can be divided into a number of categories, such as: adjectival form, use of a determiner, use of "NOS", eponyms, and spelling variants. While SNODENT enforces just a single meaning for terms that are in and of themselves polysemous, SNOMED allows terms to be used in a variety of meanings. 437 (6.73%) SNODENT terms are used in SNOMED with different meanings, the majority reflecting a (systematic) oddity of SNOMED rather than of SNODENT. Some examples are given in Table 1. Table 1: Examples of single terms in SNODENT used with plural meanings in SNOMED SNODENT term SNODEN T enomen SNOMED concept name Bruise M-14200 Bruise (finding) Bruise M-14200 Contusion – lesion (morphologic abnormality) Alveolar arch of mandible T-11182 Structure of alveolar arch of mandible (body structure) Alveolar arch of mandible T-11182 Entire alveolar arch of mandible (body structure) Bloody discharge M-36860 Bloody discharge (morphologic abnormality) Bloody discharge M-36860 Bloody discharge (substance) Chemotherapy P2-67010 Chemotherapy (procedure) Chemotherapy P2-67010 Drug therapy (procedure) Chemotherapy P2-67010 Antineoplastic chemotherapy regimen (regime/therapy) Chemotherapy P2-67010 Administration of antineoplastic agent (procedure) L.J. Goldberg et al. / The Significance of SNODENT 739 It is clear from the above that, at the time of our initial analyses (May 2004), all of the SNODENT terms have not yet been incorporated into SNOMED-CT. 2.2 Terms and Conceptcodes in SNODENT For the work described here we used the SNODENT version received from the American Dental Association in March 2004, and the July 2003 version of SNOMED-CT. The SNODENT file contained 6491 unique records, relating to 6491 unique terms (called enomens in SNODENT) and 3863 unique concept codes (called termcodes in SNODENT). A first problem concerns the absence of a clear specification of the exact relationship between a term and a termcode in SNODENT. Most examples suggest a relation of synonymy, as in: D5-10000 Dental disease, NOS D5-10000 Disease of teeth, NOS D5-10000 Tooth disorder, NOS F-51540 Expectoration of bloody sputum F-51540 Expectoration of hemorrhagic sputum Other examples, however, suggest that terms receive the same termcode also where detailed differences are irrelevant for the purposes to which SNODENT is to be put. Thus all of the following refer to the same SNODENT termcode: F-A3692 Adverse taste perception F-A3692 Chorda tympani disorder F-A3692 Dysgeusia F-A3692 Neurologic unpleasant taste F-A3692 Parageusia F-A3692 Perversion of sense of taste F-A3692 Primary taste disorder A chorda tympani disorder does not need to involve a taste disorder. Dysgeusia and parageusia are related, but different disorders, while a primary taste disorder subsumes absence of taste which is not the case with neurologic unpleasant taste. Another example of the same phenomenon is: T-53120 Dorsal surface of anterior two-thirds of tongue T-53120 Dorsal surface of tongue T-53120 Dorsum of anterior tongue There are cases in which the exact meaning of a term cannot be captured without inspecting the other terms in its immediate neighborhood. As an example: facial nerve function could refer to the function of any nerve in the face or to the function of that nerve which is called the facial nerve: F-A3610 Facial nerve function, NOS F-A3610 Seventh cranial nerve function, NOS 2. 3. Analysis of Structural and Ontological Problems in SNODENT There are also problems in SNODENT caused by the failure to follow ontological principles of good coding. [5]. Thus there are closely related, though ontologically completely different, entities which receive the same termcode. Cheilodynia is pain in the lips, hence a pain; while painful lips are lips. Yet these entities receive the same termcode in SNODENT: D5-22070 Cheilodynia D5-22070 Painful lips L.J. Goldberg et al. / The Significance of SNODENT740 Another example of the same phenomenon is: D5-10578 Sensitive dentin D5-10578 Tooth sensitivity Of course there is nothing wrong with assigning the same code to different entities if the differences at issue are irrelevant for a specific objective. And certainly pain in the lips and lips that are painful are closely associated (the entire sensation is cortical under either heading). Our studies have shown, however, that when a terminology is structured in accordance with what ontologists have called the principle of disjointness, so that ontologically distinct top-level categories do not overlap in the classes they subsume, then this brings significant benefits in avoiding error propagation, and also supports alignment with external data and information sources. [7] 2.4. Analysis of SNODENT relative to SNOMED-CT In order to arrive at a well-founded understanding of the way in which SNODENT is integrated structurally within the wider SNOMED framework, it is necessary to employ a methodology which begins by tracing paths from all SNODENT concepts to SNOMED-CT's topconcept, following only the relationships available in SNOMED. The resulting sub-graph represents some 2% of the total SNOMED graph structure. By applying the algorithm described above to this sub-graph we were able to identify via a comparison of the original subsumption hierarchy with the generated one cases of both underand overspecification of entities. We subjected each of these cases to detailed analysis in order to isolate the precise types of problems with which they are associated. In all some 1081 new termcodes were generated by this method, reflecting entities not present in the SNOMED-CT tables but required by SNOMED's own classification principles. To give an example, consider the hierarchy generated for the SNODENT-concept Moon's molar teeth. We found this concept to subsumed by the generated concept which subsumes in its turn the concepts congenital anomaly of molar tooth and a third generated concept with the defined meaning: congenital developmental tooth disorder of size and form. Inspection of the generated concepts in context reveals that the concept taurodontism is underspecified, since it does not contain any other information than that carried by its parent concept. This contravenes a structural principle underlying DL-based terminologies (and also a central structural principle of good terminology building practice). This example points to a number of tasks to be carried out in the enhancement of SNODENT, including: assessing which generated concepts should be included in an enhanced version of the SNODENT codes and which should be excluded assigning a fully specified name to the included concepts subjecting the differentiating criteria which led to the excluded concepts to a critical analysis and revising them accordingly verifying for the included concepts that they subsume all concepts that they should subsume. (Thus an experienced dentist might expect disorders other than taurodontism and tuberculum paramolare to be subsumed by the corresponding generated concept.) This may lead to the observation that the disclosed disorders need to be added to the system, or that they are currently underspecified or classified in the wrong place. To give an impression of the size of the task, we note that the 1081 generated concepts themselves subsume between 1 and 37 concepts. Many of the generated concepts that subsume only one concept have a meaning identical to that of the existing SNOMED-CT L.J. Goldberg et al. / The Significance of SNODENT 741 concept. This feature is typical of concepts high in the subsumption hierarchy. This method allows us to look for other significant patterns as well, such as generated concepts that are subsumed by non-generated concepts that also subsume other non-generated concepts. So we discovered that tooth finding erroneously does not have the criteria for tooth structure and jaw region, which is clearly an underspecification in SNOMED. That is SNOMED currently treats tooth finding as if it were associated with the locations: digestive structure and oral cavity structure but not with the locations: tooth structure and jaw region structure. 3. Conclusion Dentistry is one of the health care professions and has a number subspecialties within it. It has its own educational institutions (dental schools), governing bodies (e.g., the American Dental Association in the US), and specialty boards (e.g., Periodontics, Orthodontics). Dentists in private practice and in health care institutions use billing systems that are specifically designed for dental procedures, however, diagnostic codes are not in use in dentistry. SNODENT is a set of diagnostic codes that was specifically designed to serve the field of dentistry. It is hoped that SNODENT will become an effective tool in enhancing the clinical practice of dentistry. It is also hoped that SNODENT will become a resource for both basic and clinical research into issues that involve oral health and the interaction of orofacial systems with other systems in the body. SNODENT has now, appropriately, been included in SNOMED-CT. If SNODENT is to become widely used in dentistry, and fulfill the hopes regarding the benefits it can bring to the dental profession, its patients and to the research community, a great many problems will have to be overcome. Our results show that SNODENT requires considerable enhancements in content, quality of coding, quality of ontological structure and the manner in which it is integrated and aligned with SNOMED. We believe that methods for the analysis of the quality of diagnostic coding systems must be developed and employed if such systems are to reach their full potential. 4. References [1] Smith B, Köhler J, Kumar A. On the application of formal principles to life science data: A case study in the Gene Ontology, Proc DILS 2004 (Data Integration in the Life Sciences), (Lecture Notes in Bioinformatics 2994), Berlin: Springer, 2004, 79–94. [2] Smith B, Rosse C. The role of foundational relations in the alignment of biomedical ontologies, Proc Medinfo, 2004. [3] Guarino N. Some ontological principles for designing upper level lexical resources. In: Rubio A, Gallardo N, Castro R, Tejada A, editors. Proceedings of First International Conference on Language Resources and Evaluation. ELRA European Language Resources Association, Granada, Spain; 1998. 527-534. [4] Cimino JJ. Desiderata for controlled medical vocabularies in the twenty-first century. Methods of Information in Medicine; 1998;37(4-5):394-403 [5] Spackman KA, Reynoso G. Examining SNOMED from the perspective of formal ontological principles. KR-MED 2004. [6] Ceusters W, Smith B, Kumar A, Dhaen C. Ontology-based error detection in SNOMED-CT. Proc Medinfo 2004 (in press). [7] Ceusters W, Buekens F, De Moor G, Waagmeester A. The distinction between linguistic and conceptual semantics in medical terminology and its implications for NLP-based knowledge acquisition. Met Inform Med 1998; 37(4/5): 327-33. Address for correspondence: Louis J. Goldberg, Department of Oral Diagnostic Sciences, School of Dentistry, University at Buffalo , Buffalo, New York, 14216, USA, goldberg@buffalo.edu L.J. Goldberg et al. / The Significance of SNODENT