From: Proceedings of the 2009 AMIA Summit on Translational Bioinformatics, San Francisco, CA, 2009. p 116-120 Toward an Ontological Treatment of Disease and Diagnosis Richard H. Scheuermann*, PhD1 , Werner Ceusters, MD2,4, and Barry Smith*, PhD3,4 1Department of Pathology and Division of Biomedical Informatics, University of Texas Southwestern Medical Center, Dallas, TX; 2Department of Psychiatry, 3Department of Philosophy and 4Center of Excellence in Bioinformatics, University at Buffalo, Buffalo, NY Abstract Effective knowledge representation requires the use of standardized vocabularies to ensure both shared understanding between people and interoperability between information systems. Unfortunately, many existing biomedical vocabulary standards rest on incomplete, inconsistent or confused accounts of basic terms pertaining to diseases, diagnoses, and clinical phenotypes. Here we outline what we believe to be a logically and biologically coherent framework for the representation of such entities and of the relations between them. We defend a view of disease as involving in every case some physical basis within the organism that bears a disposition toward the execution of pathological processes. We present our view in the form of a list of terms and definitions designed to provide a consistent starting point for the representation of both disease and diagnosis in information systems in the future. * Introduction The goal of this communication is to outline a terminological framework that encompasses diseases, their causes and manifestations, and diagnostic acts and other entities pertaining to the ways diseases are recognized and interpreted in the clinic. Inspection reveals that such entities have thus far not been adequately treated in standard vocabulary resources. The National Cancer Institute Thesaurus (NCIT), for example, identifies 'Chronic Phase of Disease' as a subtype of 'Finding', which it defines as: Objective evidence of disease perceptible to the examining physician (sign) and subjective evidence of disease perceived by the patient (symptom) [1]. This definition implies, however, that a disease does not exist except as one or other form of evidence. It thus illustrates a common conflation between processes on the side of the organism and the evidence for the existence of such processes. That this conflation is problematic is revealed when we need to link observable clinical phenomena to hypothesized unobservable biological causes. A misplaced focus on observables is reflected also in the traditional practice of classifying diseases on the * These authors contributed equally to this work. basis of patterns of similarities in signs and symptoms. This practice creates problems in face of the wide variations in clinical presentations of many diseases [2] and of the increasing importance for our understanding of the ways disease correlates with genetic and environmental variables [3]. The effective study of such correlations requires clinical research to be applied to ever larger pools of subjects drawn from geographically separated populations in multi-institution studies, requiring that the healthcare institutions involved embrace common standardized terminologies in capturing and sharing their data. The definitions presented here are designed to provide the resources in terminology and disease classification to support such standardization. The approach we recommend rests on an account of diseases as dispositions rooted in physical disorders in the organism and realized in pathological processes. This approach helps us to do justice (1) to the existence of pre-clinical manifestations of disease (disorders can exist before they are realized in overt pathological processes); (2) to the combinations of disease and predispositions to disease which can exist within a single patient (as when an instance of disease of type A in a given patient is a risk factor for a second disease of type B); and (3) to the fact that the disease course and the clinical picture may vary widely between patients who have the same disease. Materials and methods We reviewed the current definitions of terms pertaining to disease and diagnosis in standard terminology resources, including SNOMED CT and NCIT. We found that these definitions inadequately capture the logical relationships between the terms defined, and therefore that they will provide an inadequate foundation for information integration and reasoning in the future. We created new definitions drawing on best practices in ontology development and in the logic of definitions as promulgated within the OBO Foundry [4]. These definitions apply to the terms as used in the context of this paper. Thus we do not claim that 'disease' as here defined denotes what clinician in every case refer to when they use the term 'disease'. Rather, our definitions are designed to make clear that such clinical use is often ambiguous. Results While it is generally good practice to provide precise definitions for the terms assembled in a terminology, in any logically coherent approach to definition, some terms must remain undefined in order to avoid circularity or infinite regress. The undefined terms used in what follows are of three sorts: either (i) they are non-technical terms derived from ordinary English; (ii) they are technical terms derived from basic science (for example, 'organism'); or (iii) they are primitive terms specific to our domain of interest. Terms in group (iii) – specifically: physical component, bodily quality, bodily process, physical basis, clinically abnormal and homeostasis – require special attention. While, ex hypothesi, we cannot provide definitions for these terms, we can provide some elucidation and illustrative examples. Informal Elucidations of Primitive Terms Physical components, as we conceive them, are anatomical structures and other physical entities within or on the surface of the body, including organs, cells, portions of bodily substances such as blood, body flora, pathogens, toxins, and their combinations. Bodily qualities are for example the color or mass of a physical component. Bodily processes are processes unfolding in or on the body in which physical components serve as participant. We use 'bodily feature' as an abbreviation for a physical component, a bodily quality, or a bodily process. (Disjunctive terms of this sort fall short of ontological best practice; they are employed here in order to simplify our treatment of established disjunctive terms such as 'sign' and 'phenotype'.) A disposition is an attribute of an organism in virtue of which it will initiate certain specific sorts of processes when certain conditions are satisfied. Examples are: our disposition to crave liquid following dehydration; the disposition of an epithelial cell in the G2 phase of the cell cycle to become diploid following mitosis. In any organism there is a wide variety of dispositions, some associated with health, others with disease. We use 'realization' to refer to the process through which a disposition is realized, and we shall identify diseases as dispositions realized in pathological processes. Each disposition in the organism has a physical basis. The physical basis of a disease is some combination of physical components within the organism, typically at multiple levels of granularity. When we say that some bodily feature of an organism is clinically abnormal, this signifies that it: i) is not part of the life plan for an organism of the relevant type (unlike aging, pregnancy or menopause), ii) is causally linked to an elevated risk either of pain or other feelings of illness, or of death or dysfunction, and iii) is such that the elevated risk exceeds a certain threshold level [5]. This treatment of 'abnormal' is distinct from those statistical treatments which do not take account of the overlap in the distribution of test results between normal and abnormal populations or of normal distribution extremes. What are standardly called 'normal variants' (for example a left lung with three lobes) do not satisfy criteria ii) and iii). We use 'homeostasis' to designate a disposition of the whole organism (or of some causally relatively isolated part of the organism, such as a single cell) to regulate its bodily processes in such a way as (1) to maintain bodily qualities within a certain range or profile and (2) to respond successfully to departures from this range caused by internal influences or environmental influences such as poisoning. When bodily processes yield qualities outside the range of homeostasis, then the organism initiates processes designed to return the qualities to a value within this range. In some cases, homeostasis can be lost and then re-gained at a level that is clinically abnormal, for example in the case of adaptation to major injury. In other cases the organism will pass a point where it falls irreversibly outside the realm of homeostasis. Definitions of Terms Referring to Entities on the Side of the Organism In what follows, we pursue a view of disease as resting in every case on some (perhaps as yet unknown) physical basis [6]. When, for example, there is a persistent elevated level of glucose in the blood, this is because (1) some physical structure or substance in the organism is disordered (e.g. loss of beta cells in pancreatic islets) as a result of which (2) there exists a disposition (diabetes) for the organism to act in a certain abnormal way. The disposition in question is realized by the initiation and execution of specific pathological processes (diabetic nephropathy) including manifestations that can be recognized as signs of the disorder (proteinuria). Disorder =def. – A causally relatively isolated combination of physical components that is (a) clinically abnormal and (b) maximal, in the sense that it is not a part of some larger such combination. Although each single cell within a tumor is disordered in its own right, for us the disorder is the tumor as a whole; it is the maximal collection of all disordered cells. Other examples of disorders are: mutated genomic DNA, portions of endotoxin in blood, blood with reduced blood cortisol levels causing adrenal crisis. Such disorders are the physical basis of disease. A disease comes into existence because some physical component becomes malformed. In some cases the disorder is a congenital malformation. In other cases it involves a virus or toxin coming in from the outside, or it arises because the absence of a normal bodily component leads to abnormal functioning. Pathological Process =def. – A bodily process that is a manifestation of a disorder. Some pathological processes are changes in the way a normal physiological function is realized (e.g. hyperventilation); some have no normal physiological counterpart (e.g. acute inflammation). Disease =def. – A disposition (i) to undergo pathological processes that (ii) exists in an organism because of one or more disorders in that organism. Epilepsy as a disease that disposes to the occurrence of seizures (pathological processes) due to an underlying abnormality in the neuronal circuitry of the brain (physical basis); AIDS as a disease that disposes to non-HIV pathogen persistence and duplication (pathological processes) following opportunistic infections that take advantage of a weakened immune system (physical basis). Predisposition to Disease of Type X =def. – A disposition in an organism that constitutes an increased risk of the organism's subsequently developing the disease X. A predisposition is a disposition to acquire a further disposition. Some diseases, for example AIDS, are predispositions to further diseases. The case of moderate genetic risk factors tells us that not all predispositions to disease are themselves diseases. Etiological Process =def. – A process in an organism that leads to a subsequent disorder. Example: toxic chemical exposure resulting in a mutation in the genomic DNA of a cell. The etiological process creates the physical basis of that disposition to pathological processes which is the disease. Some diseases are such that an organism can suffer from what is qualitatively the same disease on two distinct occasions – for example two successive bouts of influenza. The successive bouts are differentiated by their etiology in the sense that their respective physical bases are caused by distinct processes of infection by distinct viruses. With some diseases it may be possible to associate specific etiological determinants – processes which must take place if a disease is to exist. Some etiological processes, in contrast, will be causes of clinical phenotypes, such as inflammation, which are common to many diseases. They will be comparable to the environmental processes that modify the presentation and course of the disease. Etiological processes therefore do not form a natural kind. To be etiological is merely to be such as to have brought about an outcome of a certain sort. Thus pathological processes realizing one disease may lead to pain and dysfunction that give rise to the further disease of depression. Disease Course =def. – The totality of all processes through which a given disease instance is realized. Transient Disease Course =def. – A disease course that terminates in a return to normal homeostasis. Example: a bout of flu. Chronic Disease Course =def. – A disease course that (a) does not terminate in a return to normal homeostasis and (b) would, absent intervention, fall within an abnormal homeostatic range. Examples: acquired deafness; intermittent seizures in a person suffering from epilepsy. Progressive Disease Course =def. – A disease course that (a) does not terminate in a return to homeostasis and (b) would, absent intervention, involve an increasing deviation from homeostasis Example: malignant cancer. Note that for any given patient it may at any given point in time be difficult to determine which type of disease course is involved. A single episode of transient paralysis may be insufficient to arrive at a diagnosis of multiple sclerosis until a second episode occurs. Although the disposition was present at the time of the initial episode, our ability to diagnose the underlying disorder is limited by the manifestations that have been observed up to that point in time. Definitions of Terms Referring to Genetic Disorders Genetic Disorder =def. – A disorder whose etiology involves an abnormality in the nucleotide sequence of an organism's genome. Constitutional Genetic Disorder =def. – A genetic disorder inherited during conception that is borne by all cells in the organism. Examples: mutation in the hexosaminidase gene leading to Tay-Sachs disease. Acquired Genetic Disorder =def. – A genetic disorder acquired by a single cell in an organism that leads to a population of cells within the organism bearing the disorder. Example: a point mutation acquired in the H-ras gene in colorectal adenoma cells. Constitutional Genetic Disease =def. – A disease whose physical basis is a constitutional genetic disorder. Examples: chronic: color blindness, polydactyly; progressive: Down syndrome, Tay-Sachs disease. Acquired Genetic Disease =def. – A disease whose physical basis is an acquired genetic disorder. Examples: chronic: benign colonic neoplasia (here the physical basis is an APC mutation); progressive: malignant colon cancer (here the physical basis is a combination of APC, ras and p53 mutations). Genetic Predisposition to Disease of Type X =def. – A predisposition to disease of type X whose physical basis is a constitutional abnormality in an organism's genome. This abnormality is the physical basis for the increased risk of acquiring disease X. Examples: p53 mutation in Li-Fraumeni Syndrome predisposing to cancer; ApoE alleles predisposing to Alzheimer's disease. Definitions of Terms Referring to Infections Infectious Disorder =def. – A disorder whose etiology includes the presence of a pathogenic organism within a host organism or an abnormal imbalance in the normal resident organismal flora. Infectious Disease =def. – A disease whose physical basis is an infectious disorder. Examples: transient: seasonal flu; chronic: genital herpes; progressive: Ebola hemorrhagic fever. Secondary Infection =def. – A disorder consisting in the presence of a pathogenic organism within a host organism that occurs due to the disposition established by a prior infection with a pathogenic organism of a different kind (e.g. cryptosporidiosis in a patient suffering from AIDS). Definitions of Terms Relating to Clinical Evaluations Sign =def. – A bodily feature of a patient that is observed in a physical examination and is deemed by the clinician to be of clinical significance. We can distinguish a further use of 'sign' in the context 'sign of'. Two clinicians may observe the same clinically abnormal bodily feature, e.g. a hand tremor, in a single patient but interpret it differently, either as a 'sign of' a distinct disorder (where the patient has two disorders) or of one disorder but about which they differ in opinion about the relevant disease type (e.g. hyperthyroidism or Parkinson's). Vital sign =def. – A physical sign in which a nonzero value is standardly considered to be an indication that the organism is alive. The relative values for vital signs are often used as measures that can indicate the presence of disease. Symptom =def. – A bodily feature of a patient that is observed by the patient and is hypothesized by the patient to be a realization of a disease. Again we can distinguish the special usage 'symptom of': a clinician may attribute a symptom as being a symptom of some specific disease. On some readings of the term, 'symptom' refers paradigmatically to pains and other feelings and sensations which are such that they can be observed only by the patient. Neither signs nor symptoms form a natural kind, but are rather composite classes – fiat collections of bodily features delineated by certain socially established cognitive practices on the parts of clinicians and patients. Clinical History =def. – A series of statements representing health-relevant features of a patient. The term 'clinical history' is also sometimes used to refer to the collection of disease courses in a given patient. Even a patient who never went to the doctor may have a clinical history on this reading. Clinical History Taking =def. – An interview in which a clinician elicits a clinical history from a patient or from a third party who is authorized to make health care decisions on behalf of the patient. Physical Examination =def. – A sequence of acts of observing and measuring bodily features of a patient performed by a clinician; measurements may occur with and without elicitation. Laboratory Test =def. – A measurement assay that has as input a patient-derived specimen, and as output a result representing a quality of the specimen. Laboratory Finding =def. – A representation of a quality of a specimen that is the output of a laboratory test and that can support an inference to an assertion about some quality of the patient. Normal Value =def. – A value for a quality reported in a lab report and asserted by the testing lab or the kit manufacturer to be normal based on a statistical treatment of values from a reference population. Manifestation of a Disease =def. – A bodily feature of a patient that is (a) a deviation from clinical normality that exists in virtue of the realization of a disease and (b) is observable. Observability includes observable through elicitation of response or through the use of special instruments. Preclinical Manifestation of a Disease =def. – A manifestation of a disease that exists prior to its becoming detectable in a clinical history taking or physical examination. Clinical Manifestation of a Disease =def. – A manifestation of a disease that is detectable in a clinical history taking or physical examination. Phenotype =def. – A bodily feature or combination of bodily features of an organism determined by the interaction of the genetic make-up of the organism and its environment. Clinical Phenotype =def. – A clinically abnormal phenotype. Disease Phenotype =def. – A clinically abnormal phenotype that is characteristic of a single disease. Note that, according to our definition, a disease phenotype can exist without being observed. Indeed, as technology advances, our ability to detect the underlying components of a disease phenotype will expand. The full disease phenotype would incorporate the abnormal phenotypes realized at each stage in the development of the disease. As with 'disorder' we can also distinguish a less and a more inclusive reading of 'disease phenotype'. Under the former, a disease phenotype may be a a single type of abnormality characteristic of a given disease; under the latter a disease phenotype is a maximal combination of such single phenotypes, ordered in a temporal sequence characteristic of one or more typical courses for the given disease. Clinical Picture =def. – A representation of a clinical phenotype that is inferred from the combination of laboratory, image and clinical findings about a given patient. Diagnosis =def. – A conclusion of an interpretive process that has as input a clinical picture of a given patient and as output an assertion (diagnostic statement) to the effect that the patient has a disease of such and such a type. A diagnosis is a continuant entity that, once made, will survive through time, and is often supplanted by further diagnoses. The diagnostic process is thus iterative: the clinician is forming hypotheses during history taking, testing these during physical exam, forming new hypotheses as a result, and so on. Discussion The following figure summarizes the view of disease and diagnosis presented here. As a result of an etiological process, a physical change occurs in the healthy individual giving rise to a disorder whose realizations, which are initially undetectable (preclinical manifestation), and then become detectable as symptoms and signs (clinical manifestations). The latter constitute in their totality the phenotype for the given disease as instantiated in this specific patient. They can be observed through physical examination and laboratory testing of specimens derived from the patient, the results of which can be recorded in the medical record as a clinical picture. The clinical picture is interpreted by the physician in arriving at a diagnosis, which serves in turn as the foundation for the development of a patient management plan. Acknowledgements For helpful discussions we thank L. Cowell, W. Hogan, A. James, J. Loscalzo, B. Peters, N. Williams, and the attendees of the 2008 'Signs, Symptoms and Findings: First Steps Toward an Ontology of Clinical Phenotypes' and 'Infectious Disease Ontology' workshops. This work is supported by the NIH – N01AI40076, N01AI40041, U54RR023468 and U54HG004928, and by the Oishei Foundation. References 1. http://nciterms.nci.nih.gov on 12/22/2008. 2. Loscalzo J, Kohane I, Barabasi A-L. Human disease classification in the postgenomic era: A complex systems approach to human pathobiology, Mol Syst Biol. 2007; 3: 124. 3. Butte AJ, Kohane IS. Creation and implications of a phenome-genome network. Nat Biotechnol. 2006; 24(1), 55-62. 4. Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ, Leontis N, Rocca-Serra P, Ruttenberg A, Sansone SA, Scheuermann RH, Shah N, Whetzel PL, Lewis S. The OBO Foundry: Coordinated evolution of ontologies to support biomedical data integration, Nature Biotechnology 2007; 25 (11): 1251-1255. 5. Schulz S, Johansson I. Continua in biological systems. The Monist, 2007; 90(4): 499-522. 6. Williams, N. The factory model of disease. The Monist, 2007; 90(4): 555-584.