Formal principles governing best practices in classification and definition have for too long been neglected in the construction of biomedical ontologies, in ways which have important negative consequences for data integration and ontology alignment. We argue that the use of such principles in ontology construction can serve as a valuable tool in error-detection and also in supporting reliable manual curation. We argue also that such principles are a prerequisite for the successful application of advanced data integration techniques such as ontology-based (...) multi-database querying, automated ontology alignment and ontology-based text-mining. These theses are illustrated by means of a case study of the Gene Ontology, a project of increasing importance within the field of biomedical data integration. (shrink)
Quality assurance in large terminologies is a difficult issue. We present two algorithms that can help terminology developers and users to identify potential mistakes. We demonstrate the methodology by outlining the different types of mistakes that are found when the algorithms are applied to SNOMED-CT. On the basis of the results, we argue that both formal logical and linguistic tools should be used in the development and quality-assurance process of large terminologies.
Formalisms such as description logics (DL) are sometimes expected to help terminologies ensure compliance with sound ontological principles. The objective of this paper is to study the degree to which one DL-based biomedical terminology (SNOMED CT) complies with such principles. We defined seven ontological principles (for example: each class must have at least one parent, each class must differ from its parent) and examined the properties of SNOMED CT classes with respect to these principles. Our major results are: 31% of (...) the classes have a single child; 27% have multiple parents; 51% do not exhibit any differentiae between the description of the parent and that of the child. The applications of this study to quality assurance for ontologies are discussed and suggestions are made for dealing with multiple inheritance. (shrink)
We present the details of a methodology for quality assurance in large medical terminologies and describe three algorithms that can help terminology developers and users to identify potential mistakes. The methodology is based in part on linguistic criteria and in part on logical and ontological principles governing sound classifications. We conclude by outlining the results of applying the methodology in the form of a taxonomy different types of errors and potential errors detected in SNOMED-CT.
The Unified Medical Language System and the Gene Ontology are among the most widely used terminology resources in the biomedical domain. However, when we evaluate them in the light of simple principles for wellconstructed ontologies we find a number of characteristic inadequacies. Employing the theory of granular partitions, a new approach to the understanding of ontologies and of the relationships ontologies bear to instances in reality, we provide an application of this theory in relation to an example drawn from the (...) context of the pathophysiology of hypertension. This exercise is designed to demonstrate how, by taking ontological principles into account we can create more realistic biomedical ontologies which will also bring advantages in terms of efficiency and robustness of associated software applications. (shrink)
The integration of biomedical terminologies is indispensable to the process of information integration. When terminologies are linked merely through the alignment of their leaf terms, however, differences in context and ontological structure are ignored. Making use of the SNAP and SPAN ontologies, we show how three reference domain ontologies can be integrated at a higher level, through what we shall call the OBR framework (for: Ontology of Biomedical Reality). OBR is designed to facilitate inference across the boundaries of domain ontologies (...) in anatomy, physiology and pathology. (shrink)
The National Cancer Institute’s Thesaurus (NCIT) has been created with the goal of providing a controlled vocabulary which can be used by specialists in the various sub-domains of oncology. It is intended to be used for purposes of annotation in ways designed to ensure the integration of data and information deriving from these various sub-domains, and thus to support more powerful cross-domain inferences. In order to evaluate its suitability for this purpose, we examined the NCIT’s treatment of the kinds of (...) entities which are fundamental to an ontology of colon carcinoma. We here describe the problems we uncovered concerning classification, synonymy, relations and definitions, and we draw conclusions for the work needed to establish the NCIT as a reference ontology for the cancer domain in the future. (shrink)
The automatic integration of information resources in the life sciences is one of the most challenging goals facing biomedical informatics today. Controlled vocabularies have played an important role in realizing this goal, by making it possible to draw together information from heterogeneous sources secure in the knowledge that the same terms will also represent the same entities on all occasions of use. One of the most impressive achievements in this regard is the Gene Ontology (GO), which is rapidly acquiring the (...) status of a de facto standard in the field of gene and gene product annotations, and whose methodology has been much intimated in attempts to develop controlled vocabularies for shared use in different domains of biology. The GO Consortium has recognized, however, that its controlled vocabulary as currently constituted is marked by several problematic features - features which are characteristic of much recent work in bioinformatics and which are destined to raise increasingly serious obstacles to the automatic integration of biomedical information in the future. Here, we survey some of these problematic features, focusing especially on issues of compositionality and syntactic regimentation. (shrink)
The integration of standardized biomedical terminologies into a single, unified knowledge representation system has formed a key area of applied informatics research in recent years. The Unified Medical Language System (UMLS) is the most advanced and most prominent effort in this direction, bringing together within its Metathesaurus a large number of distinct source-terminologies. The UMLS Semantic Network, which is designed to support the integration of these source-terminologies, has proved to be a highly successful combination of formal coherence and broad scope. (...) We argue here, however, that its organization manifests certain structural problems, and we describe revisions which we believe are needed if the network is to be maximally successful in realizing its goals of supporting terminology integration. (shrink)
An explicit formal-ontological representation of entities existing at multiple levels of granularity is an urgent requirement for biomedical information processing. We discuss some fundamental principles which can form a basis for such a representation. We also comment on some of the implicit treatments of granularity in currently available ontologies and terminologies (GO, FMA, SNOMED CT).
that can serve as a foundation for more refined ontologies in the field of proteomics. Standard data sources classify proteins in terms of just one or two specific aspects. Thus SCOP (Structural Classification of Proteins) is described as classifying proteins on the basis of structural features; SWISSPROT annotates proteins on the basis of their structure and of parameters like post-translational modifications. Such data sources are connected to each other by pairwise term-to-term mappings. However, there are obstacles which stand in the (...) way of combining them together to form a robust meta-classification of the needed sort. We discuss some formal ontological principles which should be taken into account within the existing datasources in order to make such a metaclassification possible, taking into account also the Gene Ontology (GO) and its application to the annotation of proteins. (shrink)
Formalisms based on one or other flavor of Description Logic (DL) are sometimes put forward as helping to ensure that terminologies and controlled vocabularies comply with sound ontological principles. The objective of this paper is to study the degree to which one DL-based biomedical terminology (SNOMED CT) does indeed comply with such principles. We defined seven ontological principles (for example: each class must have at least one parent, each class must differ from its parent) and examined the properties of SNOMED (...) CT classes with respect to these principles. Our major results are: 31% of these classes have a single child; 27% have multiple parents; 51% do not exhibit any differentiae between the description of the parent and that of the child. The applications of this study to quality assurance for ontologies are discussed and suggestions are made for dealing with the phenomenon of multiple inheritance. The advantages and limitations of our approach are also discussed. (shrink)
Tumors, abscesses, cysts, scars, fractures are familiar types of what we shall call pathological continuant entities. The instances of such types exist always in or on anatomical structures, which thereby become transformed into pathological anatomical structures of corresponding types: a fractured tibia, a blistered thumb, a carcinomatous colon. In previous work on biomedical ontologies we showed how the provision of formal definitions for relations such as is_a, part_of and transformation_of can facilitate the integration of such ontologies in ways which have (...) the potential to support new kinds of automated reasoning. We here extend this approach to the treatment of pathologies, focusing especially on those pathological continuant entities which arise when organs become affected by carcinomas. (shrink)
The theory of granular partitions (TGP) is a new approach to the understanding of ontologies and other classificatory systems. The paper explores the use of this new theory in the treatment of task-based clinical guidelines as a means for better understanding the relations between different clinical tasks, both within the framework of a single guideline and between related guidelines. We used as our starting point a DAML+OIL-based ontology for the WHO guideline for hypertension management, comparing this with related guidelines and (...) attempting to show that TGP provides a flexible and highly expressive basis for the manipulation of ontologies of a sort which might be useful in providing more adequate Computer Interpretable Guideline Models (CIGMs) in the future. (shrink)
The Gene Ontology is an important tool for the representation and processing of information about gene products and functions. It provides controlled vocabularies for the designations of cellular components, molecular functions, and biological processes used in the annotation of genes and gene products. These constitute three separate ontologies, of cellular components), molecular functions and biological processes, respectively. The question we address here is: how are the terms in these three separate ontologies related to each other? We use statistical methods and (...) formal ontological principles as a first step towards finding answers to this question. (shrink)
Ontological principles are needed in order to bridge the gap between medical and biological information in a robust and computable fashion. This is essential in order to draw inferences across the levels of granularity which span medicine and biology, an example of which include the understanding of the roles of tumor markers in the development and progress of carcinoma. Such information integration is also important for the integration of genomics information with the information contained in the electronic patient records in (...) such a way that real time conclusions can be drawn. In this paper we describe a large multi-granular datasource built by using ontological principles and focusing on the case of colon carcinoma. (shrink)
The Foundational Model of Anatomy (FMA) is a map of the human body. Like maps of other sorts – including the map-like representations we find in familiar anatomical atlases – it is a representation of a certain portion of spatial reality as it exists at a certain (idealized) instant of time. But unlike other maps, the FMA comes in the form of a sophisticated ontology of its objectdomain, comprising some 1.5 million statements of anatomical relations among some 70,000 anatomical kinds. (...) It is further distinguished from other maps in that it represents not some specific portion of spatial reality (say: Leeds in 1996), but rather the generalized or idealized spatial reality associated with a generalized or idealized human being at some generalized or idealized instant of time. It will be our concern in what follows to outline the approach to ontology that is represented by the FMA and to argue that it can serve as the basis for a new type of anatomical information science. We also draw some implications for our understanding of spatial reasoning and spatial ontologies in general. (shrink)
(Report assembled for the Workshop of the AMIA Working Group on Formal Biomedical Knowledge Representation in connection with AMIA Symposium, Washington DC, 2005.) Best practices in ontology building for biomedicine have been frequently discussed in recent years. However there is a range of seemingly disparate views represented by experts in the field. These views not only reflect the different uses to which ontologies are put, but also the experiences and disciplinary background of these experts themselves. We asked six questions related (...) to biomedical ontologies to what we believe is a representative sample of ontologists in the biomedical field and came to a number conclusions which we believe can help provide an insight into the practical problems which ontology builders face today. (shrink)
An important part of the Unified Medical Language System (UMLS) is its Semantic Network, consisting of 134 Semantic Types connected to each other by edges formed by one or more of 54 distinct Relation Types. This Network is however for many purposes overcomplex, and various groups have thus made attempts at simplification. Here we take this work further by simplifying the relations which involve the three Semantic Types – Diagnostic Procedure, Laboratory Procedure and Therapeutic or Preventive Procedure. We define operators (...) which can be used to generate terms instantiating types from this selected set when applied to terms designating certain other Semantic Types, including almost all the terms specifying clinical tasks. Usage of such operators thus provides a useful and economical way of specifying clinical tasks. The operators allow us to define a mapping between those types within the UMLS which do not represent clinical tasks and those which do. This mapping then provides a basis for an ontology of clinical tasks that can be used in the formulation of computer-interpretable clinical guideline models. (shrink)
Clinical guidelines are special types of plans realized by collective agents. We provide an ontological theory of such plans that is designed to support the construction of a framework in which guideline-based information systems can be employed in the management of workflow in health care organizations. The framework we propose allows us to represent in formal terms how clinical guidelines are realized through the actions of are realized through the actions of individuals organized into teams. We provide various levels of (...) implementation representing different levels of conformity on the part of health care organizations. Implementations built in conformity with our framework are marked by two dimensions of flexibility that are designed to make them more likely to be accepted by health care professionals than standard guideline-based management systems. They do justice to the fact 1) that responsibilities within a health care organization are widely shared, and 2) that health care professionals may on different occasions be non-compliant with guidelines for a variety of well justified reasons. The advantage of the framework lies in its built-in flexibility, its sensitivity to clinical context, and its ability to use inference tools based on a robust ontology. One disadvantage lies in its complicated implementation. (shrink)
Outcomes research in healthcare has been a topic much addressed in recent years. Efforts in this direction have been supplemented by work in the areas of guidelines for clinical practice and computer-interpretable workflow and careflow models.In what follows we present the outlines of a framework for understanding the relations between organizations, guidelines, individual patients and patient-related functions. The derived framework provides a means to extract the knowledge contained in the guideline text at different granularities, in ways that can help us (...) to assign tasks within the healthcare organization and to assess clinical performance in realizing the guideline. It does this in a way that preserves the flexibility of the organization in the adoption of the guidelines. (shrink)
We provide a methodology for the creation of ontological partitions in biomedicine and we test the methodology via an application to the phenomenon of blood pressure. An ontology of blood pressure must do justice to the complex networks of intersecting pathways in the organism by which blood pressure is regulated. To this end it must deal not only with the anatomical structures and physiological processes involved in such regulation but also with the relations between these at different levels of granularity. (...) For this purpose our ontology offers a variety of distinct partitions � of substances, processes and functions � and integrates these together within a single framework via transitive networks of part-whole and dependence relations among the entities in each of these categories. The paper concludes with a comparison of this methodology with the approaches of GOTM, KEGG, DIP and BIND and provides an outline of how the methodology is currently being applied in the field of biomedical database integration. (shrink)
Evidence-based medicine relies on the execution of clinical practice guidelines and protocols. A great deal of of effort has been invested in the development of various tools which automate the representation and execution of the recommendations contained within such guidelines and protocols by creating Computer Interpretable Guideline Models (CIGMs). Context-based task ontologies (CTOs), based on standard terminology systems like UMLS, form one of the core components of such a model. We have created DAML+OIL-based CTOs for the tasks mentioned in the (...) WHO guideline for hypertension management, drawing comparisons also with other related guidelines. The advantages of CTOs include: contextualization of ontologies, providing ontologies tailored to specific aspects of the phenomena of interest, dividing the complexity involved in creating ontologies into different levels, providing a methodology by means of which the task recommendations contained within guidelines can be integrated into the clinical practices of a health care set-up. (shrink)
The International Classification of Functioning, Disability and Health provides a classification of human bodily functions, which, while exhibiting non-conformance to many formal ontological principles, provides an insight into which basic functions such a classification should include. Its evaluation is an important first step towards such an adequate ontology of this domain. Presented at the 13th Annual North American WHO Collaborating Center Conference on the ICF, 2007.
There are a number of existing classifications and staging schemes for carcinomas, one of the most frequently used being the TNM classification. Such classifications represent classes of entities which exist at various anatomical levels of granularity. We argue that in order to apply such representations to the Electronic Health Records one needs sound ontologies which take into consideration the diversity of the domains which are involved in clinical bioinformatics. Here we outline a formal theory for addressing these issues in a (...) way that the ontologies can be used to support inferences relating to entities which exist at different anatomical levels of granularity. Our case study is the colon carcinoma, one of the most common carcinomas prevalent within the European population. (shrink)
Recent work on the quality assurance of the Gene Ontology (GO, Gene Ontology Consortium 2004) from the perspective of both linguistic and ontological organization has made it clear that GO lacks the kind of formalism needed to support logic-based reasoning. At the same time it is no less clear that GO has proven itself to be an excellent terminological resource that can serve to combine together a variety of biomedical database and information systems. Given the strengths of GO, it is (...) worth investigating whether, by overcoming some of its weaknesses from the point of view of formal-ontological principles, we might not be able to enhance a version of GO which can come even closer to serving the needs of the various communities of biomedical researchers and practitioners. It is accepted that clinical and bioinformatics need to find common ground if the results of data-intensive biomedical research are to be harvested to the full. It is also widely accepted that no single method will be sufficient to create the needed common framework. We believe that the principles-based approach to life-science data integration and knowledge representation must be one of the methods applied. Indeed in dealing with the ontological representation of carcinomas, and specifically of colon carcinomas, we have established that, had GO (and related biomedical ontologies) followed some of the basic formal-ontological principles we have identified (Smith et al. 2004, Ceusters et al. 2004), then the effort required to navigate successfully between clinical and bioinformatics systems would have been reduced. We point here to the sources of ontologically-related errors in GO, and also provide arguments as to why and how such errors need to be resolved. (shrink)
Two senses of ‘ontology’ can be distinguished in the current literature. First is the sense favored by information scientists, who view ontologies as software implementations designed to capture in some formal way the consensus conceptualization shared by those working on information systems or databases in a given domain. [Gruber 1993] Second is the sense favored by philosophers, who regard ontologies as theories of different types of entities (objects, processes, relations, functions) [Smith 2003]. Where information systems ontologists seek to maximize reasoning (...) efficiency even at the price of simplifications on the side of representation, philosophical ontologists argue that representational adequacy can bring benefits for the stability and resistance to error of an ontological framework and also for its extendibility in the future. In bioinformatics, however, a third sense of ‘ontology’ has established itself, above all as a result of the successes of the Gene Ontology (hereafter: GO), which is a tool for the representation and processing of information about gene products and their biological functions [Gene Ontology Consortium 2000]. We show how Basic Formal Ontology (BFO) has established itself as an overarching ontology drawing on all three of the strands distinguished above, and describe applications of BFO especially in the treatment of biological granularity. (shrink)
In previous work on biomedical ontologies we showed how the provision of formal definitions for relations such as is_a and part_of can support new types of auto-mated reasoning about biomedical phenomena. We here extend this approach to the transformation_of characteristic of pathologies.
It is widely understood that protein functions can be exhaustively described in terms of no single parameter, whether this be amino acid sequence or the three-dimensional structure of the underlying protein molecule. This means that a number of different attributes must be used to create an ontology of protein functions. Certainly much of the required information is already stored in databases such as Swiss-Prot, Protein Data Bank, SCOP and MIPS. But the latter have been developed for different purposes and the (...) separate data-structures which they employ are not conducive to the needed data integration. When we attempt to classify the entities in the domain of proteins, we find ourselves faced with a number of cross-cutting principles of classification. Our question here is: how can we bring together these separate taxonomies in order to describe protein functions? Our proposed answer is: via a careful top-level ontological analysis of the relevant principles of classification, combined with a new framework for the simultaneous manipulation of classifications constructed for different purposes. (shrink)