The Infectious Disease Ontology in the Age of COVID-19 Shane Babcock1,5*, John Beverley2,5, Lindsay G. Cowell3,5, Barry Smith4,5 1Department of Philosophy, Niagara University, Lewiston, NY, USA 2Department of Philosophy, National Center for Ontological Research, Northwestern University, Evanston, IL, USA 3Cowell Lab, University of Texas Southwestern Medical Center, Dallas, TX, USA 4Department of Philosophy, University at Buffalo, Buffalo, NY, USA 5National Center for Ontological Research, University at Buffalo, Buffalo, NY, USA * Correspondence: Shane Babcock sbabcock@niagara.edu This article is a preprint and has not been certified by peer review. Abstract Background: Efforts to respond effectively to public health emergencies, such as we are now experiencing with COVID-19, require data sharing across multiple disciplines and data systems. Such data sharing is often hindered by the fact that relevant information is collected using different discipline-specific terminologies and coding systems and by consequent failures of interoperability between data systems. Ontologies offer a strategy to overcome problems of this sort. In practice, however, this strategy is often undermined by uncoordinated ontology development. Since 2004 the Open Biomedical Ontologies Foundry has been developing ontologies through a modular, coordinated approach, exemplified by the Infectious Disease Ontology (IDO). Results: IDO is a suite of interoperable ontology modules that aims to provide coverage of all aspects of the infectious disease domain, including biomedical research, clinical care, and public health. The center of this suite is IDO Core, a diseaseand pathogen-neutral ontology covering just those types of entities and relations that are relevant to infectious diseases generally. IDO Core is extended by disease and pathogen-specific ontology modules. In this paper we present applications of IDO Core, an overview of several IDO extension ontologies, and illustrate the methodology on the basis of which they are built. We also survey recent developments involving IDO, including: IDO Virus (VIDO); the Coronavirus Infectious Disease Ontology (CIDO); and an extension of CIDO focusing on COVID-19 (IDO-COVID-19). We discuss how these ontologies might assist in information-driven efforts to deal with the ongoing COVID-19 pandemic, to accelerate data discovery in the early stages of future pandemics, and to promote reproducibility of infectious disease research. Conclusions: IDO provides a simple recipe for building novel, powerful, pathogen-specific ontologies that facilitate data about novel diseases to be easily compared, along multiple dimensions, with already curated data from earlier diseases. The IDO strategy, moreover, supports ontology coordination, providing a powerful method of data integration and sharing that allows physicians, researchers, and public health organizations to respond rapidly and efficiently to current and future public health crises. Keywords: coronavirus, COVID-19, infectious disease, Infectious Disease Ontology, ontology, data integration, data reproducibility 2 Background Efforts by physicians, researchers, and public health organizations to respond to infectious diseases require the use of multiple, constantly changing data sources. Consider, for instance, a research team trying to build an effective, large-scale epidemiological system for modeling a given population's herd immunity to measles. This depends on the integration of data not merely from biology and medicine, but also from public health, geography, and social science. The system will need to incorporate society-wide data on measles occurrence rates, transmission mode, birth rates, vaccination rates, family structures, age distribution, and other relevant demographic factors [1], and also patient-specific data on clinical manifestations of disease, diagnoses, and treatments received. Because the relevant information is collected using disciplineand community-specific methodologies and is stored in geographically distributed and often non-interoperable databases, the data are typically only locally accessible. The resultant silo-formation [2] hinders both translational and comparative research and preventive and prognostic public health research [3]. These problems can be solved by traditional means with the investment of sufficient time and effort. In circumstances of public health emergency, however, and where data must be shared across multiple disciplines from immunochemistry at one extreme to behavioral population modeling at the other, more powerful methods for data sharing and integration must be applied. As the experience of biologists and bioinformaticians has shown, ontologies are a powerful data sharing tool [4]. But to be effective, it is important that ontologies are designed in a coordinated fashion – otherwise ontologies themselves will give rise to the creation of a new kind of silo [2]. One of the most successful and widely adopted approaches to coordinated ontology development is that of the Open Biomedical Ontologies (OBO) Foundry [5], a collective of developer groups dedicated to creating, testing, and maintaining a suite of ontologies based on an evolving set of ontology design principles: • Ontologies should use a well-specified syntax and share a common space of identifiers. • Ontologies should be openly available in the public domain for reuse. • Ontologies in neighboring domains should be developed in a collaborative effort. • Ontologies should be developed in a modular fashion. • Ontologies should have a clearly specified scope. • Ontologies should use common unambiguously defined relations between their terms. • Ontologies should conform to a common top-level architecture. The OBO Foundry principles were modelled initially on the practices of the Gene Ontology (GO) [4], which has served as the model for subsequent life science ontologies [6]. The GO forms the centerpiece of the OBO Foundry ecosystem. Wherever possible, OBO ontologies are created using terms, relational expressions, and definitions taken from existing OBO ontologies, including the Relation Ontology (RO) [7], which ensures cross-linkage between ontologies in neighboring domains and also helps avert redundant efforts. Ontologies aligning with OBO Foundry principles also require that each class has a unique identifier with the bipartite form of ID-space:Local-ID, for example GO:0008150. Use of such unique identifiers means that the source of each term – and specifically the version of the ontology from which it is drawn – can be immediately identified by its prefix. Their use also helps to ensure that ontologies retain backward compatibility with legacy annotations as those ontologies evolve. Ontology construction and extension in accordance with OBO principles follows a 'hub and spokes' model, where a core or 'hub' ontology provides the basis for extension ontologies providing 3 more specific terms than the hub. The Infectious Disease Ontology (IDO), first released in 2010 [8], was constructed in this manner, and consequently, provides a central 'hub' from which 'spoke' ontologies extend to represent specific infectious diseases. Results 1. The Infectious Disease Ontology IDO Core covers just those entities that are relevant to infectious diseases generally, and not to specific infectious diseases associated with specific pathogens. The coverage of IDO Core ranges across biological scales (gene, cell, organ, organism, population), disciplinary perspectives (biological, clinical, epidemiological), and successive stages along the chain of infection (host, reservoir, vector, pathogen) [9]. IDO Core coverage, and adherence to OBO Foundry principles, provides developers of IDO extensions a well-tested starting point for straightforwardly generating needed diseaseand pathogen-specific terms. Having a reliable starting point from which infectious disease ontologies can easily extend is of particular importance, as we see in the current pandemic where data concerning the novel SARS-CoV-2 and its associated disease COVID-19 has evolved quickly across disciplines. We outline this reliable foundation in what follows, by introducing and justifying important IDO Core terms. 1.1 Foundations of IDO Core: BFO and OGMS At the heart of the IDO Core is the term 'disease', which is imported from the Ontology for General Medical Science (OGMS) [10], which serves as a hub relative to the IDO Core. OGMS covers types of entities relevant to clinical encounters between doctor and patient. Thus, it includes representations of disease, causes and manifestations of disease, diagnosis, symptom, treatment, patient examination, history taking, laboratory test, and so forth. OGMS is itself an extension of the Basic Formal Ontology (BFO), the official top-level ontology and hub for all OBO Foundry ontologies. BFO is comprised of highly general classes such as 'object', 'material entity', and 'process', is used by more than 300 ontology projects as their top-level architecture [2], and has been approved as international standard ISO/IEC 21838-2 [11, 12]. Developers of OGMS view the traditional practice of classifying diseases according to patterns of similarities in signs and symptoms (or, more generally, of phenotypes) as inadequate. A single disease may manifest a variety of symptoms, making it difficult to distinguish the disease definitionally from other diseases involving the same anatomical system. For example, Celiac disease shares many symptoms in common with Crohn's disease, and with another disease of the gut caused by c. difficile infection [13]. Identifying diseases based on symptoms is, moreover, unhelpful when hosts are asymptomatic, as also occurs in some c. difficile infections. Lastly, the traditional practice of identifying diseases with phenotypes fails to do justice to the increasing importance played by genetic and environmental variables in disease taxonomy. Seeking to address these issues, OGMS characterizes diseases in BFO terms as dispositions of patients to undergo pathological processes of specific kinds. Distinguishing manifestations of symptoms from dispositions to manifest symptoms provides the flexibility needed to represent asymptomatic patients, who are still disposed to manifest symptoms of a certain kind, and it helps us to deal with diseases that have multiple different sorts of presentation. In addition to distinguishing between disease, signs and symptoms, and pathological processes, OGMS further distinguishes between: 4 diagnosis =def. Representation of a conclusion of a diagnostic process. diagnostic process =def. Health care process that involves the interpretation of a clinical picture from a given patient (input) and the assertion to the effect that the patient has a disease, disorder, or syndrome of a certain type, or none of these (output). disease =def. Disposition to (i) undergo pathogenic processes that (ii) exists in an organism because of one or more disorders in that organism. disease course =def. Totality of all processes through which a given disease instance is realized. disorder =def. Material entity which is clinically abnormal and part of an extended organism; disorders are the physical basis of disease. A disorder is then further elucidated as a clinically abnormal feature of an organism, a feature that (1) is not part of the life plan for an organism of the relevant type (unlike aging or pregnancy), (2) is causally linked to an elevated risk either of pain or other feelings of illness, or of death or dysfunction, and (3) is such that the elevated risk exceeds a certain threshold level [10]. The definition of disorder also involves reference to: extended organism =def Object aggregate consisting of an organism and all material entities located within that organism overlapping the organism or occupying sites formed in part by the organism. A primary reason for introducing extended organism is to provide a class for the material entity that comprehends not only the organism itself but also the normal microflora and invading pathogens contained within it and also the pathogens on its surface. Related, OGMS has recently adopted: symptom =def. Process experienced by the patient, which can only be experienced by the patient, that is hypothesized to be clinically relevant. In line with this, all subclasses of symptom appearing within IDO Core and its individual extensions will henceforth be consistently classified as processes. These distinctions provide a consistent framework for the collection of data where, for example, different clinicians diagnose the same disease differently, or where a disease exists without having yet been diagnosed [10]. The OGMS approach allows, moreover, for the existence of preclinical manifestations of disease, and for clinical risk factor combinations of disease and predispositions to disease (as when an instance of AIDS in a given patient is a risk factor for a second disease such as tuberculosis [14]). Conflating these distinctions is a known problem in many medical vocabulary resources [15]. 1.2 Extending IDO Core from OGMS In extending OGMS (see Figure 1), IDO Core distinguishes between: • infectious disease • sign and/or symptom of infectious disease • infectious disease diagnosis • infectious disease course • infection 5 Figure 1  Relationships between disease, disorder, and disease courses in IDO Core With subclasses of entities linked by BFO relations such as realizes and has_material_basis, the latter used to indicate the material basis of a disposition, in this case, a disease. The relevant IDO terms are defined as follows: infection =def. Material entity part of an organism's extended organism that itself has some pathogen as part, which participates in the formation of the material entity by invading the tissues of the organism. infectious disorder =def. Disorder that has some infectious agent or infectious structure as part, which participates in the formation of the disorder. infectious disease =def. Disease whose physical basis is an infectious disorder. infectious disease course =def. Disease course that is the realization of an infectious disease. To elucidate the definition of infection and make clearer its relationship to the classes disorder and infectious disorder, we first introduce further needed terms used by IDO Core. IDO Core imports the class organism from the Ontology for Biomedical Investigations (OBI) [16]. It also imports from the NCBI organismal classification [17] classes for pathogenic organisms such as bacteria. IDO Core defines a new class, acellular structure, defined as "Object that is an arrangement of interrelated acellular parts forming a biological unit", to capture pathogens, such as viruses and prions, which do not classify as organisms (organisms being either composed of multiple cells, or unicellular). IDO Core defines the following terms pertaining to pathogens and hosts: pathogenic disposition =def. Disposition borne by a material entity to establish localization in a host or produce toxins that can be transmitted to a host, either of which may form disorder in the host or immunocompetent members of the host's species. pathogen =def. Material entity with a pathogenic disposition. virulence =def. Quality that inheres in a pathogen and is the degree to which realizations of the infectious disease caused by the pathogen become severe or fatal. virulence factor disposition =def. Disposition borne by a biological macromolecule produced by a pathogen that is a disposition to undergo processes that increase the pathogen's virulence. 6 host role =def. Role borne by an acellular structure containing a distinct material entity, or organism whose extended organism contains a distinct material entity, realized in use of that structure or organism as a site of reproduction or replication.1 host =def. Object bearing a host role. establishment of localization in host =def. Process in which a material entity reaches a site in an organism in which it can survive, grow, multiply, or mature and establishes itself there. colonization of host =def. Establishment of localization in host process in which an organism establishes a colony in or on a host. Establishment of localization in host is a subclass of GO: establishment of localization, which often involves the tethering or adhesion of a cell, substance, or cellular entity to some other entity. For instance, virus particles (virions) enter host cells, and viral DNA is integrated into host DNA, thus establishing the virus in the host (viral localization also involves the binding of viral membrane proteins to host cell receptors). Pathogenic disposition refers to the ability of a pathogen to participate in an establishment of localization in host, reflecting the fact that pathogens often cause disorders as a result of maturing, surviving, and/or multiplying in a host. As when parasitic helminths in the lumens deprive the host of essential nutrients. As pathogenic disposition's disjunctive definition makes clears though, this is not always necessary, as when a person develops food botulism due to the ingestion of foodborne toxins produced by clostridium botulinum. The definition also covers cases involving both mechanisms, as in infant botulism where the intestines of an infant are colonized by c. botulinum and secreted toxins are then absorbed into the bloodstream. Related, IDO Core provides terms for pathogen virulence factors, such as toxin, endotoxin, and exotoxin, as well as corresponding dispositions. Notice that ectoparasites such as head lice are not counted as pathogens. Colonization of the head may lead to excessive itching, opening wounds through which bacterial pathogens enter the host. But while the lice colony helps facilitate conditions under which a disorder is subsequently formed, it is the bacteria, established in the host, that form the disorder (or release toxins that do so). Building on its pathogen-related terms, IDO Core defines terms pertaining to infectious entities and their dispositions: infectious disposition =def. Pathogenic disposition borne by an organism or acellular structure to be transmitted to a host and then become part of an infection in that host or immunocompetent members of the same species as the host. infectious agent =def. Organism that has an infectious disposition. infectious structure =def. Acellular structure that has an infectious disposition. As well as defining processes involved in the establishment of infections and disorders: process of establishing an infection =def. Process by which an infectious agent or infectious structure, established in a host, becomes part of an infection in the host. appearance of disorder =def Process by which a disorder comes into existence. Aspects of infectious disposition are illustrated in Figure 2. 1 Traditionally, hosts are classified as organisms. But in line with our classification of viruses as acellular structures, we have curated this definition to accommodate the case of a virus serving as host to an infecting virophage. Also, we are aware that host role, and its subclasses, do not pertain only to the infectious disease domain. Eventually they need to be moved up to an appropriate ontology then imported back to IDO Core. At the time of its creation, such an ontology did not exist and so we defined the needed terms. We are working with the OBO Foundry to resolve this issue. 7 Figure 2  Some aspects of IDO infectious disposition In making infectious disposition a child of pathogenic disposition, IDO Core distinguishes pathogenicity and infectiousness. By our definitions, c. botulinum, for example, is a pathogen, but not an infectious agent. While c. botulinum may become part of infection in infants that ingest honey colonized by the bacterium, it does not bear an infectious disposition as it is not disposed to be transmitted to the infant host. There is an implicit temporal ordering in the textual definition of infectious disposition. Thus, typically, when an infectious entity realizes its infectious disposition, first it is transmitted to the host before establishing localization in the host, after which it will become part of an infection prior to the appearance of disorder. In some cases, an infectious entity may only partially realize its infectious disposition, such as when a virus is transmitted to a host but fails to establish itself due to the host's immune response. But if the virus establishes itself in the host, becomes part of an infection, and forms a disorder in the host, then an infectious disorder is established. If an infection, but not a disorder, is formed, then the infection is not an infectious disorder. Some disorders are commonly called "infestations", as in the case of Sarcoptes scabiei, the tiny mite species whose instances burrow under the skin and cause the itchy condition known as scabies. Where 'infestation' is used to refer to invasion of a host by complex organisms (mites, ticks, helminths), there is a use of the term 'infection' to refer to invasion of a host by microorganisms (viruses, bacteria, fungi, protozoa). But this distinction, based on size of invader, does not track a distinction between invaders that are infectious and those that are not (helminths are infectious, ticks are not). Thus, while researchers do talk of S. scabiei disorders as infestations, they also speak of them as infections (just as they do with helminth infestations). Researchers recognize S. scabiei as an infectious agent and recognize scabies as an infectious disease [18]. In short, researchers talk in both ways and our definition of infection is compatible with both ways of talking. Note that, correctly, our definitions do not count the presence of commensal microorganisms in the human microbiome, many of which bear an infectious disposition, as constituting either an infection or an infectious disorder. This is because the material entity part of our extended organism that contains these pathogens was not formed by an invasive process of establishing an infection. Under normal circumstances these pathogens are unable to realize their infectious disposition and become part of an infection. Of course, they can form disorders in their hosts, if they end up in the wrong anatomical site, as in the case of bacteremia (defined in IDO Core as "Infection that has as part bacteria located in the blood"), or if the populations grow out of control, as in the case of yeast infections. And these cases are correctly counted by our definitions as infectious disorders. An infectious disposition is a disposition to either form disorder in a host or other immunocompetent members of the host's species. This allows IDO Core to cover infections that are not disorders, but are nevertheless contagious. An example would be an HIV-1 infected human host that is resistant to the virus due to a mutation of the CCR-5 gene that blocks the virus from attaching to host cells, and so blocks pathogenesis to AIDS [19]. Such an infection is not clinically abnormal as it is not causally 8 linked to an elevated risk of either pain or other feelings of illness, or of death or dysfunction in the resistant host. But while the HIV-1 virus is unable to fully realize its infectious disposition in the host, it is still disposed to transmit to and bring clinical abnormality to other potential immunocompetent human hosts without the mutation.2 The IDO Core definition of infection thus allows for representation of commensal populations, infectious disorders that can be caused by organisms that are typically commensal, and the fact that asymptomatic infected individuals are contagious (as we see now with SARS-CoV-2). Lastly, and worth noting, infection is given the following logical definition: infection  material entity AND (part of SOME (organism AND (part of SOME extended organism))) AND (has part SOME (pathogen AND (participates in SOME process of establishing an infection))) We then give infectious disorder the following logical definition: infectious disorder  disorder AND (part of SOME (organism AND (part of SOME extended organism))) AND (has part SOME ((infectious agent OR infectious structure) AND (participates in SOME process of establishing an infection))) While infectious disorder is classified as a disorder, these axioms ensure that in IDO Core infectious disorder is also an inferred subclass of infection. 1.3 Transmission of Pathogens IDO Core recognizes the importance of characterizing infectious disease transmission in its various forms. From the Pathogen Transmission Ontology [20] IDO Core imports terms such as3: pathogen transmission process =def. Process in which a pathogen is transmitted directly or indirectly to a new host. indirect pathogen transmission process =def. Pathogen transmission process in which a pathogen is indirectly transferred to a host by intermediary vehicles or vectors. A variety of infectious diseases, including malaria and dengue fever, are vector borne. Thus, IDO Core contains terms such as: pathogen transporter role =def. Role borne by a material entity in or on which a pathogen is located, from which the pathogen may be transmitted to a new host. pathogen vector role =def. Pathogen transporter role that is borne by an organism active in the transfer of an infectious agent or infectious structure to an organism of another species in which it can realize its infectious disposition. pathogen vector =def. Organism bearing a pathogen vector role. While the definition of pathogen transporter role requires the bearer to actually have a pathogen located in or on it, bearers also have certain dispositions that enable them to play the role. While a mosquito bears the pathogen vector role only when a malaria parasite is located in or on it, there also inheres in its 2 Note that we are not claiming that there are no clinical abnormalities associated with these mutations. Individuals with CCR-5 mutations do exhibit clinical abnormalities, and so disorders, but importantly, this is not because of the HIV-1 virus. Rather, it is because of the genetic mutation. 3 For IDO Core we have modified the textual definitions from their originals. 9 physical structure a disposition to transfer the parasites, which it has whether or not it contains any parasites. Similarly, for respiratory droplets that serve as vehicles for viruses such as SARS CoV-2. Notice also that by our definition of pathogen vector role, a mosquito bears the role even in the case where it is actively transferring a malaria parasite to a non-infectable human being bearing the sickle-cell trait. What is important is that the parasite is being transferred to an organism of a species in which its infectious disposition can typically be realized. By this definition, a mosquito is not playing the vector role when transferring the parasite to an organism of a non-susceptible species. In other cases, infectious agents, such as the Schistosoma helminth parasites that cause schistosomiasis, spend part of their life cycle within intermediate hosts, such as snails, after which the pathogen is transmitted into another medium, such as water, which then directly transmits the pathogen to definitive hosts such as humans. Thus, IDO Core contains the following terms: symbiont host role =def. Host role borne by an organism whose extended organism provides an environment supportive for the survival, growth, maturation, or reproduction of an object contained as a proper part. intermediate host role =def. Symbiont host role borne by an organism whose partner in symbiosis utilizes the host to undergo a development stage transition, and the host is required for continuation of the partner's life cycle. intermediate host =def. Host bearing an intermediate host role. definitive host role =def. Symbiont host role borne by an organism whose partner in symbiosis reaches developmental maturity or reproduces sexually in the host. The preceding selection does not exhaust those host roles included in IDO Core but does reflect the wide range of ways in which to characterize host-symbiont relationships. 1.4 Pathogen Inhibition and Control IDO Core provides terms relevant to the inhibition and killing of pathogens: cidal agent disposition =def. Disposition inhering in a material entity, that is realized in a process of killing bacteria, fungi, parasites, or viruses. static agent disposition =def. Disposition inhering in a material entity, that is realized in a process of inhibiting the reproduction of bacteria, fungi, or parasites, or a process of inhibiting the replication of viruses. Subclasses for dispositions to kill specific pathogen-types-namely bactericidal disposition, fungicidal disposition, parasiticidal disposition and viricidal disposition-can then be defined in IDO reference ontologies pertaining to the corresponding pathogen-types (see section 2.3 below). The same for subclasses for dispositions to inhibit the reproduction/replication of specific pathogen-types- namely bacteriostatic disposition, fungistatic disposition, parasitostatic disposition, and virostatic disposition. We use the above dispositions to define the corresponding material entities: cidal agent =def. Material entity with a cidal agent disposition. static agent =def. Material entity with a static agent disposition. Pathogen-type specific subclasses of cidal agent-namely bactericidal, fungicidal, parasiticidal, and viricide-can then be defined in the appropriate IDO reference ontologies. Likewise, for subclasses of static agent, namely bacteriostatic, fungistatic, parasitostatic, and virostatic. 10 The terms cidal agent, cidal agent disposition, static agent, and static agent disposition are intended to be broad. The same for subclasses like bactericidal and viricide. Treatment agents that act only against certain bacterial, fungal, parasitic, or viral species are not within the purview of IDO Core. Such classes are to be defined in pathogen-specific extension ontologies. By our definitions the immune system, and the cells and cellular entities that constitute it, bear both cidal and static agent dispositions (as do devices such as autoclaves and sterilizers). This is how it should be. Many drugs work not by directly killing or inhibiting pathogens, but rather by ramping up the immune system, which is what is often doing the killing/inhibiting. While many associate terms like bactericidal, and viricide with drugs and other chemical substances, researchers also use such terms to describe proteins in the immune system, especially interferon-gamma which is secreted by T helper cells. Relatedly, one of the most important applications of IDO is to the phenomenon of resistance. Examples of resistance include a population's herd immunity to certain populations of infectious organisms and the resistance of certain pathogens to antimicrobial drugs. The correct identification of different types of resistance is essential to both treatment decisions and public health policies [21]. For instance, varying strains of Staphylococcus aureus can differ in terms of their degree of resistance and in the types of drug to which they are resistant. In the examples described above, resistance is a feature of an organism, or population of organisms, that serves to protect it/them from being damaged by some other entity. To capture this aspect of resistance, IDO Core contains the term protective resistance which is defined as follows: protective resistance =def. Disposition inhering in an acellular structure or organism, with a part having a disposition to mitigate damage to the entity from internal and invasive threats, which is realized in one or more negative biological regulation processes.4 Protective resistance includes not just drug resistance on the part of infectious agents or the resistance of hosts to infectious agents, but also things like the resistance of vectors to insecticide. What occurs in many cases where protective resistance is manifested is that another process is prevented. Consider the immunity of an individual X to an infectious organism Y that is able to cause damage to X. Y is disposed to be transmitted to and establish itself in X, and become part of an infection in X. X's immunity to Y is realized in certain processes that prevent certain of the aforementioned processes from occurring, thus mitigating the damage those process may have potentially caused to X. To capture this aspect of resistance, we characterize protective resistance in terms of what we call a "blocking disposition" [9, 21], a disposition the manifestation of which prevents, or at least mitigates, the realization of another disposition. The disposition whose realization is prevented (or mitigated) is called a "blocked disposition." Thus, since X's immunity to infectious organism Y is realized in processes that prevents certain realizations of Y's infectious disposition, the former is a blocking disposition for the latter (the latter being a blocked disposition). This characterization of resistance is further enhanced in IDO Core by importing from RO the relation negatively_regulates, holding between processes x and y, and defined as: x negatively_regulates y =def. The progression of x reduces the frequency, rate, or extent of y. Thus, we can say that X's immunity to Y is a blocking disposition for Y's infectious disposition insofar that X's immunity is realized in certain processes that negatively_regulate the manifestation of Y's infectious disposition. For instance, X's immunity may be realized in processes – such as, 4 The last clause refers to the GO class negative regulation of biological process, a process that stops, prevents, or reduces the frequency, rate or extent of a biological process. Thus my blocking of a knife thrust isn't the realization of a protective resistance, as a knife thrust is not a biological process. Whereas when a virus evades a host immune response (a biological process) it is realizing a protective resistance. 11 antibody secretion which would neutralize viral particles, preventing them from entering host cells – the progression of which reduces the rate at which, or the extent to which, Y establishes itself in X. 1.5 Infectious Disease Epidemiology and Surveillance IDO Core includes terms for population-level processes, such as the epidemiological spread of disease: infectious disease epidemic =def. Process of infectious disease realizations for which there is a statistically significant increase in the infectious disease incidence of a population. infectious disease pandemic =def. Process in which multiple infectious disease epidemics of the same type of infectious disease unfold over overlapping periods of time and affect organism populations located in different geographic regions, including different countries and continents. Additionally, IDO Core contains qualities of disease affected populations, such as infectious disease incidence rate, infectious disease mortality rate, and infectious disease endemicity, and we have recently added terms for the corresponding sites at which these qualities are instantiated. For example: infectious disease endemic site, infectious disease free site, and infectious disease non-endemic site. Adding these classes allows them to serve as parent classes for the brucellosis specific classes brucellosis endemic site, brucellosis free site, and brucellosis non-endemic site currently used in the Brucellosis Ontology (IDOBRU) [22]. We have also introduced to IDO Core the terms holoendemicity, hypoendemicity, and mesoendemicity to represent the varying degrees to which diseases can be endemic within different populations. IDO Core's coverage of epidemiology is being developed through collaboration with the developers of the epidemiology focused ontologies Apollo Structured Vocabulary (Apollo-SV) [23] and the Genomic Epidemiology Ontology (GenEpio) [24]. GenEpio itself reuses a number of important IDO Core terms and IDO extensions will for their part incorporate appropriate terms from Apollo-SV and GenEpio. As in all biomedical ontology, each of the mentioned ontologies evolves in response to term requests from users in order to accommodate the community's needs. Apollo-SV provides a standardized vocabulary for terms and relations required for the interoperation between epidemic simulator models and public health application software that interface with these models. Apollo-SV, which draws heavily on the Information Artifact Ontology (IOA) [25], provides a variety of pertinent terms. For instance, we have imported from Apollo-SV to IDO Core the following terms (which descend from IOA: directive information content entity): disease surveillance objective specification =def. Objective specification whose endpoint is human awareness of the level of a disease in a particular population of a given biological taxon during some time interval. infectious disease control objective specification =def. Objective specification that is realized by processes that are able or likely to stop the spread of a disease in a population. infectious disease control strategy =def. Plan specification whose objective specification is an infectious disease control objective specification. Additionally, we have imported subclasses of infectious disease control strategy, including: contact tracing =def. Infectious disease control strategy that identifies and treats contacted organisms in a host population. 12 quarantine control strategy =def. Infectious disease control strategy whereby organisms who have contact with infectious organisms but are not symptomatic or otherwise known to be infectious are prevented from having contact with other susceptible organisms. In line with the above, we have also created the following terms for IDO Core: infectious disease surveillance objective specification =def. Objective specification whose endpoint is human awareness of the level of a particular infectious disease in a particular population of a given biological taxon during some time interval. infectious disease surveillance =def. Planned process that is the realization of an infectious disease surveillance objective specification. IDO Core has also been expanded with classes from the Vector Surveillance and Management Ontology (VSMO) [26], including pathogen surveillance and vector surveillance5: pathogen surveillance =def. Surveillance process aiming to produce information about one or more pathogens, with the purpose of managing those pathogens. vector surveillance =def. Surveillance process aiming to produce information about changes in the geographical distribution and density of one or several pathogen vectors, with the purpose of facilitating appropriate and timely decisions regarding interventions. Several IDO extension ontologies already contain coverage of surveillance for the corresponding pathogen, and these ontologies will now be re-engineered as appropriate. The Influenza Ontology (IDOFLU), in particular, has an extensive treatment of influenza surveillance as part of the Centers for Excellence in Influenza Research and Surveillance program [27]. The ontology is meant to be applicable to any virus sequence and surveillance collection project, consolidating sequence and surveillance terms from a variety of online databases. We have outlined and motivated a range of important terms useful for researchers studying infectious diseases from a variety of perspectives. We next examine extensions of IDO Core. 2 Extensions of IDO Core In the ideal case, all IDO extension ontologies would be developed in the same way, and in conformance to all Foundry principles. There are, fortunately, several excellent examples of IDO extensions conforming to Foundry principles. We introduce here some of the better developed IDO extensions which are appropriately aligned. Unfortunately, not all of the Foundry principles have been followed faithfully in the IDO extension ontologies developed thus far. An overview of the results of applying the IDO Core plus extensions method of ontology development is included in Supplementary Table 2, Additional File 1, which also documents other disease ontologies employing IDO terms (Supplementary Table 3). In addition to IDO, many other ontologies make use of the OGMS approach to disease. See Supplementary Table 1, Additional File 1 for details. The current state of each extension is summarized in Table 1. Some further issues concerning the IDO Extensions are detailed in Additional File 2. That said, we close this section identifying a general concern with extensions importing terms from the Human Disease Ontology. 5 In VSMO these were subclasses of surveillance process. This term was not appropriate for inclusion in IDO since it is not a term that has specifically to do with infectious disease. Its children in other ontologies might include, for example, intelligence surveillance, engineering surveillance, aircraft health surveillance, and so forth. It is instead being moved to OBI. 13 Table 1  IDO Extension Ontologies: *=subject to re-engineering, **obsoleted/will be replaced. Coronavirus Infectious Disease Ontology (CIDO)* [28-30] Most recent version uploaded to Bioportal on August 16, 2020 [31] Influenza Ontology (IDOFLU)* [27] Most recent version uploaded to BioPortal on August 20, 2015 [32] Brucellosis Ontology (IDOBRU)* [22, 33] Most recent version uploaded to BioPortal on March 28, 2015 [34] IDO Virus (VIDO) [XX] Most recent version uploaded to BioPortal on August 3, 2020 [35] IDO-COVID-19 Infectious Disease Ontology [XX] Most recent version uploaded to BioPortal on August 3, 2020 [36] Dengue Ontology (IDODEN)* [37] Most recent version uploaded to BioPortal on February 17, 2014 [38] Malaria Ontology (IDOMAL)** [39, 40] Though obsoleted, IDOMAL is being hosted for legacy purposes [41]. Meningitis Ontology (IDOMEN)* [42] Draft version uploaded on November 27, 2019 [43] Plant Disease Ontology (IDOPlant) [44] Draft version released in 2012 [45] Staphylococcus aureus Infectious Disease Ontology (IDOSA) [9, 21, 46] Released on June 22, 2012 [47] Schistosomiasis Ontology (IDOSCHISTO)* [48] Most recent version uploaded on October 23, 2013 [49] HIV Ontology (IDOHIV)* [50] Most recent version uploaded to BioPortal on April 4, 2017 [51] IDO Tuberculosis (IDOTB) IDO Infective Endocarditis (IDOIE) Planned, not yet in development 2.1 Examples: IDOBRU & IDOPlant Two ontologies from the preceding table provide examples of excellent ontology design. IDOBRU, the Brucellosis Infectious Disease Ontology, is maintained by the He research team at the University of Michigan, and is used to facilitate the integration and exchange of brucellosis information stored in widely used databases, including: The Brucella Bioinformatics Portal [52]: a portal for the search and analysis of individual Brucella genes; linked to more than 20 other databases and programs. The Vaccine Investigation and Online Information Network (VIOLIN) [53] a central repository for literature related to, and data resulting from, vaccine research. IDOBRU exhibits a well-organized hierarchy with BFO, OGMS, and IDO Core imported in full, and is a good exemplar of the IDO Core hub and spokes model, as illustrated in Table 2. 14 Table 2  IDOBRU Hierarchy IDOBRU Axis Top Level IDOBRU Classes Imported OBO Ontology Class from which it descends host infection and zoonotic disease transmission process of establishing Brucella infection in host, Brucella infectious disposition, Brucella host role process of establishing an infection (IDO Core) zoonotic disposition (IDO Core) infectious agent host role (IDO Core) virulence factors and pathogenesis Brucella virulence factor, Brucella virulence factor disposition virulence factor (IDO Core), virulence factor disposition (IDO Core), symptoms brucellosis symptom symptom (OGMS) diagnosis brucellosis diagnosis diagnosis (OGMS) intentional release Brucella intentional release planned process (OBI) vaccine prevention brucellosis vaccine vaccine (VO) treatment brucellosis treatment treatment (OGMS) The other example worth noting is IDOPlant, [44] a plant infectious disease ontology being developed under the auspices of the Planteome Project, which maintains a large database of annotations from plant genomic and phenomic studies [54]. IDOPlant leverages IDO Core in axioms such as the following: IDOPlant:process of establishing a Xanthomonas oryzae infection subclass-of IDOPlant:process of establishing a plant bacterial infection, IDOPlant:process of establishing a plant bacterial infection subclass-of IDO:process of establishing an infection IDOPlant:rice bacterial leaf blight disease subclass-of IDOPlant:plant bacterial disease IDOPlant:plant bacterial disease subclass-of IDOPlant:plant infectious disease IDOPlant:plant infectious disease subclass-of IDO:infectious disease 2.2 Case Study: IDOSA and methicillin resistant Staphylococcus aureus IDO:protective resistance is used to model the resistance of certain bacteria to antibiotic drugs. For this purpose, the following subtypes of protective resistance are asserted in IDO Core: drug resistance =def. Protective resistance that mitigates the damaging effects of a drug. antibiotic resistance =def. Drug resistance that mitigates the damaging effects of an antibiotic. Beta-lactam antibiotics such as methicillin are the most widely used antibiotics, and most work by preventing bacterial cell wall construction. They act by binding to and inhibiting the penicillinbinding-proteins (PBPs) within bacteria that facilitate the synthesis of peptidoglycan molecules, thus compromising the structural integrity of the cell wall. In response to the widespread use of betalactam antibiotics, some bacteria have rapidly evolved novel-structured PBPs which lack an affinity for these antibiotics, thus rendering them less effective. In IDO we aim to provide an ontological representation of resistance that reveals the active mechanisms that produce resistance-in the case of resistant bacteria, the active dispositions inhering in novel-structured PBPs that inhibit antibiotics from manifesting their damaging effects [21]. Consider the case of methicillin resistant Staphylococcus aureus (MRSa). MRSa's resistance to methicillin is conferred by PBP2a, a PBP that lacks affinity for methicillin and is the product of the 15 gene mecA. The need to provide a coherent and consistent understanding of the mechanisms underlying MRSa antibiotic resistance is one impetus for the development of the Staphylococcus aureus Infectious Disease Ontology (IDOSA), an extension of IDO covering entities specific to Staph aureus (Sa) infectious diseases [9, 21, 46]. IDOSA's main hierarchy is built on BFO, and imports IDO Core in full. IDOSA provides terms covering all entities relevant to antibacterial resistance in Sa, including terms for Sa proteins (from the Protein Ontology [55]), terms for genes and gene products, (from the Sequence Ontology [56]) terms for biological processes (from the GO Biological Process Ontology), terms for anatomical sites of infection (from the UBERON anatomy ontology [57]), and terms for antibiotics (from Chemical Entities of Biological Interest (ChEBI) [58]). IDOSA imports the term staphylococcus aureus from NCBITaxon, while adding the following subclasses: methicillin-resistant Staphylococcus aureus =def. Organism of type Staphylococcus aureus that has resistance to beta-lactam antibiotics. methicillin-susceptible Staphylococcus aureus =def. Organism of type Staphylococcus aureus that lacks resistance to beta-lactam antibiotics. Both are defined in terms of IDOSA resistance to beta-lactam antibiotic, which itself is a subclass of IDO:antibiotic resistance and defined as follows: resistance to beta-lactam antibiotic =def. Antibiotic resistance that mitigates the damaging effects of a beta-lactam antibiotic. With these terms and definitions, we can characterize both Methicillin-susceptible Staphylococcus aureus's (MSSa) susceptibility, and MRSa's resistance, to beta-lactam antibiotics in terms of protective resistance and blocking dispositions [9, 21]). MSSa is susceptible to the damaging effects of methicillin because it lacks protective resistance to that drug. Characterized positively, MSSa's PBPs have the disposition to undergo a methicillin PBP binding process that negatively_ regulates the synthesis of peptidoglycan, thereby interfering with the formation of a stable cell wall. Affinity for methicillin thus acts as a blocking disposition for the PBPs' disposition to synthesize peptidoglycan. In the case of MRSa, in contrast, the disposition of its PBP2a parts to synthesize peptidoglycan, and thereby participate in the construction of a stable cell wall (which negatively_regulates methicillin binding), cannot be blocked. Thus, MRSa's protective antibiotic resistance to methicillin can be seen as an active response in which PBP2a manifests a disposition to mitigate the damaging effects of methicillin. [21] shows how the formal representation of these relations can be used in association with instance data to draw inferences that may facilitate automated drug discovery and guide treatment decisions in specific types of cases. The IDO Core account of protective resistance can be applied also to other cases, such as the resistance against HIV-1 conferred by CCR5-32, and the resistance against malaria conferred by the sickle cell trait [21]. CCR5-32 is a deletion mutation of the CCR5 gene resulting in cells which lack a functioning CCR5 receptor on their surfaces. In this case, the disposition of individuals with the CCR5-32 mutation to develop cells that lack CCR5 on their surface acts as a blocking disposition for the disposition of HIV-1 to bind to a CCR5 molecule. Plasmodium falciparum, one of the infectious agents that causes malaria, has a disposition to spread through host red blood cells, a process that is reduced in dense, dehydrated red blood cells. In individuals with the sickle cell hemoglobin gene, red blood cells have a disposition to become dehydrated and thus increase in 16 density. This disposition acts as a blocking disposition for the disposition of plasmodium to spread through red blood cells, a process requiring hydrated red blood cells. 2.3 Partitioning of the IDO suite and a lattice of infectious disease ontologies While currently existing IDO extensions were designed as direct extensions of the Core, we can now take advantage of the fact that IDO extensions can be partitioned into subgroups on the basis of pathogen-type. Under this partitioning, the following reference ontologies serve as direct extensions of IDO Core: IDO Bacteria, IDO Virus, IDO Fungus and IDO Parasite. IDOSA, IDOMEN, IDOTB, IDOIE and IDOBRU are reengineered as extensions of IDO Bacteria. IDOFLU, IDOHIV, IDODEN and CIDO extend from IDO Virus, while IDOSCHISTO and a new ontology for malaria (replacing IDOMAL) will extend from IDO Parasite. In a similar manner, IDO extension ontologies for vector-borne diseases will extend from a new ontology – IDO VectorBorne – consisting of just those terms needed to deal with vector-borne diseases in a pathogen neutral fashion. [46] shows how IDOSA annotations of genetic, phenotypic, and demographic data on Sa isolates maintained by the Network on Antimicrobial Resistance in Staphylococcus aureus (NARSA) [59] can be used to infer a lattice application ontologies for specific subfamilies of Sa-related diseases, down to the level of specific strains. The method is generalizable to isolate repositories across the infectious disease domain. Leveraging the other extension ontologies within the IDO suite (Table 1), the method allows us to generate similar lattices for specific subfamilies of coronavirus-related diseases, influenza virus-related diseases, and so on. Together these form a larger network of infectious disease ontologies under IDO Core (Figure 3). The resultant network can be used to define a strategy for constructing a taxonomy of infectious diseases incorporating both highthroughput genetic and molecular data as well as clinical data. The network can also be used for rapidly creating new ontologies for novel pathogens or novel strains in a way that provides a pathway for automatic linking of emerging data to legacy data relating to existing pathogens and diseases. The IDO suite of ontologies can thereby contribute to the advance of what is called 'personalized' or 'precision' medicine, which depends upon effective classification and association of biological disease data with known clinical phenotypes and disease types at ever finer levels of detail. Figure 2  A possible lattice of infectious disease ontologies 17 Where two ontologies are connected by an arrow, the one lower in the lattice extends, and imports needed terms from, the latter, as well from other ontologies higher up. To be clear, subontologies only import what is needed, not all of the terms and axioms from all the ontologies of which it draws. Ontologies at the very top are upper-level OBO ontologies from which IDO Core, and other ontologies further down in the lattice, extend. Note that the graph presents only a representative sample, rather than an exhaustive list, of upper-level ontologies upon which the lattice depends. It is expected that the lattice will be expanded by IDO user groups to create application ontologies for specific research purposes. For instance, Staphylococcus aureus associated infective endocarditis can be caused by different strains of s. aureus in different host organisms (humans, cows, pigs, dogs, and so on), and so we may need separate application ontologies to cover these specific host-pathogen interactions. But representation of host-specific cases also depends on representation – at a higher level in the lattice – of factors they share in common. IDOIE, for example, would provide representation of the fact that bacterial infection of the cardiac valve depends on both the formation of fibrin-platelet deposits as a result of previous endothelial damage, as well as the occurrence of transient bacteremia [60]. Where IDOSA would represent those mechanisms by which s. aureus is able to infect the bloodstream, adhere to fibrin-platelet deposits, and infect the cardiac valve. This in turn depends on the representation of more general properties shared by all opportunistic bacterial pathogens, such as their disposition, regardless of specific host, to colonize, and cause infection in, a variety of different anatomical sites, provided the host's defenses have been compromised. Such properties would be represented in the reference ontology IDO Bacteria by making use of IDO Core terms such as opportunistic infectious disposition and bacteremia. Similarities in s. aureus infections across humans, cows, and pigs are also explained by commonalities in the biology and anatomy of mammalian hosts. These factors can be covered in the respective subontologies by importing relevant terms from higher-level ontologies such as the GO, Cell Ontology, NCBITaxon, and UBERON. One might worry our approach leads to an unneeded combinatorial explosion of ontologies. Indeed, the lattice of s. aureus infectious disease ontologies [46] suggests many ontologies will be needed just for s. aureus, given that there are so many strains, and that cows and other non-humans can serve as hosts. It is certainly true there are many combinatorial possibilities that could be generated. But we only add new application ontologies to the lattice when there is an actual need from biologists who are describing real biological phenomena. That this is happening is not a matter of combinatorial explosion, but of growth dictated by the combinations that exist. In any event, combinatorial explosion has at least one natural stopping point: do not conflate types and tokens [2]. Consider personalized medicine. To represent individual patients, we need ontologies of very fine grain, since – despite our commonalities – each of us differs in various biological ways. Broadly speaking, these ontologies will overlap at the level of determinables, but not always at the level of determinates. For example, Sally Jane and John Doe may both have a temperature (determinable) but differ in what specific temperature they each have (determinate), alongside a broad range of other determinable similarities and determinate differences (or similarities). Narrow in on creating an ontology for a patient John Doe. This ontology will presumably be composed of all and only terms from existing ontologies, since the only novelty here is John, and John is a token. Needed terminological content may be imported directly and used, or they may be used to define new terms, but this will not require new primitives for John. Moreover, there is a clear overlap, (lattices aside) between existing ontologies and the personalized John ontology with respect to logical characterizations. John Doe's personalized ontology is a definitional (and so conservative) extension of the aggregate of various existing ontologies [61]. As for ontologies to cover newly emerging viral and bacterial strains, these will require new terms, and there will likely be many needed. However, the only primitive terms we see needed in this 18 case are terms for the specific strains. Everything else can be constructed based on existing ontologies and shown to be definitional and conservative extensions of those ontologies. For example, suppose we need a SARS-CoV-3-focused ontology. We then import from IDO Core, OBI, VIDO, CIDO and other ontologies, define what terminological content we can from imported terms. We introduce the virus SARS-CoV-3 as a primitive subclass of coronavirus.6 The result is an ontology largely composed of existing ontologies, with a proper part composed of combinations of that new primitive with existing terms-for example, SARS-CoV-3 infection, SARS-CoV-3 disorder, and so on. And, since by assumption, we need to represent SARS-CoV-3 data, we are justified in adding them. So yes, we will need many ontologies, but that is a feature rather than a bug. We should be as specific as researchers need. 2.4 The importation of disease terms from the Human Disease Ontology (DOID) We close our discussion of IDO extensions with a brief word of caution. While some IDO extensions import disease terms from DOID [62] we do not recommend this in all cases moving forward. DOID purports to follow OGMS. While many DOID disease terms are defined in terms of underlying disorders, this is not so in all cases. In general, DOID disease terms often contain more information than is appropriate. Consider for instance DOID: influenza: influenza =def. A viral infectious disease that results in infection, located in respiratory tract, has_material_basis_in Influenzavirus A, has_material_basis_in Influenzavirus B, or has_material_basis_in Influenzavirus C, which are transmitted by droplet spread of oronasal secretions during coughing, sneezing, or talking from an infected person. It is a highly contagious disease that affects birds and mammals and has symptom chills, has symptom fever, has symptom sore throat, has symptom runny nose, has symptom muscle pains, has symptom severe headache, has symptom cough, and has symptom weakness. A good definition captures just those essential features of the class, including its genus, as well as the differentia that distinguishes it from other subtypes of that genus [2]. Most of the features included in the above definition are not differentia. Location in the respiratory tract, transmissibility through droplet spread, and many of the mentioned symptoms, are features common to a variety of viral infectious diseases. Such information would be better relegated to the term's associated OWL axioms, or an editor's comment. Notice the definition also gets things backwards by saying that influenza results in an infection. Rather, the infection is the basis of the disease. A better, OGMSbased template is provided by IDOSA: Staphylococcus aureus infectious disease: Staphylococcus aureus infectious disease =def. Infectious disease that has a staphylococcus aureus infectious disorder as its material basis. Accordingly, influenza ought to be defined as follows: influenza =def. Viral infectious disease that has as its material basis either an Influenzavirus A infectious disorder, an Influenzavirus B infectious disorder, or an Influenzavirus C infectious disorder. Word of caution in place, we turn next to more recent IDO Core extensions, focusing more on coronavirus research. We intend the work described below to serve as a model for the re- 6 Strictly speaking, this term would just be added to CIDO, and imported from there, but ignore that for the moment. 19 engineering of existing IDO Core extensions in such a way as to yield greater conformance with the ontology building principles discussed in the foregoing. 3 The Coronavirus Infectious Disease Ontology, IDO Virus and IDO COVID-19 In this section we detail the most recent developments of the IDO suite, including: IDO Virus (VIDO); the Coronaviruses Infectious Disease Ontology (CIDO); and an extension of CIDO focusing on SARS-CoV-2 and the associated COVID-19 disease (IDO-COVID-19). We discuss relationships among these ontologies illustrated in Figure 4. Figure 4  Links between VIDO, CIDO and IDO-COVID-19 Though we focus on IDO Core extensions relevant to coronavirus research, we acknowledge there are other ontology initiatives developed to support curation of COVID-19 data: • The WHO COVID-19 Rapid Version CRF, which provides a semantic data model for the RAPID version (23 March 2020) of the WHO's COVID-19 case record form [63]. • The COVID-19 Surveillance Ontology supports COVID-19 surveillance in primary care by facilitating the monitoring of COVID-19 cases and related respiratory conditions using data from multiple brands of computerized medical record systems [64]. • The Linked COVID-19 Data Ontology uses RDF to present COVID-19 datasets from the European Centre for Disease Prevention and Control, John Hopkins University and the Robert Koch-Institut [65]. (At present this is little more than a list of datasets rather than a bona fide ontology.) • The NASA Jet Propulsion Laboratory's COVID-19 Research Knowledge Graph builds a knowledge graph from the COVID-19 Open Research Dataset (CORD-19) [66]. 20 However, since each is a stand-alone initiative developed outside the scope of OBO Foundry principles, each is subject to the silo problems documented in the introduction. 3.1 The Virus Infectious Disease Ontology VIDO [35] is a virus-neutral extension of IDO Core including terminological content used by researchers across various domains interested in the study of viral infectious diseases. VIDO is in the later stages of its development under John Beverley, Shane Babcock, Barry Smith and collaborators. Introducing VIDO terms illustrates a recipe for ontology refinement to a more specific domain, via downward population. For example: VIDO:viral disease subclass-of IDO:infectious disease Where the former is an infectious disease caused by a virus. A similar strategy will be used to link VIDO to existing virus ontologies, for example: IDOFLU:influenza A virus infection is a subclass of VIDO:virus infection IDOHIV:HIV virus infection is a subclass of VIDO:virus infection Hence, our adverting to a recipe for more specific ontology term construction. More generally, many virus terms are generated from relevant IDO Core terms by adding the term 'virus' to generate a subclass, and adjusting textual and logical definitions accordingly. To give a few examples, VIDO extends IDO Core by introducing terms such as: virus disorder =def. Infectious disorder that exists as a result of processes of formation of disorder initiated by a virus or virus population. viral disease =def. Infectious disease inhering in a virus disorder that is a disorder due to the presence of the virus. viral disease course =def. Infectious disease course that realizes a viral disease. Paralleling the IDO Core extension of OGMS introduced earlier. Similarly, IDO Core:subclinical infection – an infection that is part of an asymptomatic host – provides resources needed to represent asymptomatic viral infections: subclinical virus infection =def. Subclinical infection that is part of a virus host. Recall, infections need not result in the manifestation of symptoms, yet nevertheless satisfy all other criteria for inclusion as infections. In these cases, the pathogen host is asymptomatic, and in the case under consideration here, the relevant pathogen is a virus. Like other IDO Core extensions, VIDO introduces terms from existing OBO Foundry ontologies where needed, such as OBI, UBERON, NCBITaxon, and many others. Additionally, the Protein Ontology consortium has recently created new terms dealing with SARS-CoV-2 proteins [67] which VIDO imports. Introduction to and justification for aspects of VIDO can be found in [XX]. For now, as illustrated in Figure 4, VIDO provides a foundation from which CIDO extends, bringing us closer to the star of the present pandemic. 3.2 The Coronavirus Infectious Disease Ontology CIDO [30] deals with coronavirus infectious diseases in general, and in that respect is more specific than VIDO. Like VIDO, CIDO imports terms from a wide range of ontologies, including IDO 21 Core, ChEBI, the National Drug File – Reference Terminology (NDF-RT), UBERON, GO, the Vaccine Ontology [68], and the NCBITaxon. In addition, CIDO introduces 8 terms specific to the coronavirus domain. CIDO has already been employed in several coronavirus-related research explorations. One application of CIDO is to the analysis and integration of information on anti-coronavirus drugs to facilitate drug repurposing against COVID-19. In a recent study [28], members of the CIDO team used text mining to identify chemical drugs and antibodies effective against at least one human coronavirus infection in vitro or in vivo and then mapped these drugs to ChEBI, the Drug Ontology (DrugOn) [69], and NDF-RT, each of which provide logical axioms linking drugs to their roles and mechanisms of action. This information was then extracted for analysis. Further relations will be built into CIDO linking drugs, coronaviruses, and the conditions under which the drugs work against the coronaviruses. CIDO is also being used in ongoing work to represent vaccines against coronavirus. In [29] reverse vaccinology and machine learning is used to predict potential vaccine targets for safe and effective COVID-19 vaccine development. The CIDO team is systematically annotating these vaccine candidates, along with their formulations and host responses, while working with the VO team to ontologically model and analyze these vaccines. To facilitate vaccine design, CIDO will be used in a further study to investigate host-pathogen interactions in order to better understand protective immune mechanisms. Like VIDO above, CIDO is extended to the more specific IDO-COVID-19, which covers COVID-19 and its cause SARS-CoV-2. IDO-COVID-19 thus brings together IDO Core, VIDO, and CIDO in the interest of fine-grained representation of this virus strain and associated diseases. 3.3 The COVID-19 Infectious Disease Ontology IDO-COVID-19 [35] imports the term SARS-CoV-2 from the NCBITaxon, building various qualities, dispositions, and material entities from the virus. We illustrate how these ontologies intersect by reflecting on an important example of SARS-CoV-2 infection: asymptomatic infections. VIDO provides CIDO and IDO-COVID-19 needed resources: CIDO:subclinical coronavirus infection subclass-of VIDO:subclinical virus infection IDO-COVID-19:subclinical SARS-CoV-2 infection subclass-of CIDO:subclinical coronavirus infection While distinguishing these instances from symptomatic cases: CIDO:coronavirus disorder subclass-of VIDO:virus disorder IDO-COVID-19:SARS-CoV-2 disorder subclass-of CIDO:coronavirus disorder These disorders are, moreover, the material bases for relevant diseases and disease courses, themselves illustrating links among VIDO, CIDO, and IDO-COVID-19: CIDO:coronavirus disease subclass-of VIDO:viral disease IDO-COVID-19:COVID-19 subclass-of CIDO:coronavirus disease CIDO:coronavirus disease course subclass-of VIDO:viral disease course IDO-COVID-19:COVID-19 disease course subclass-of CIDO:coronavirus disease course Where the relevant disease courses have the expected participants: CIDO:coronavirus disease course has participant CIDO:coronavirus IDO-COVID-19:COVID-19 disease course has participant IDO-COVID-19:SARS-CoV-2 22 Viral disease courses involve process parts which are useful for characterizing the development of a given virus in a host. This in mind, IDO-COVID-19 imports from VIDO and GO [4]: VIDO:viral disease course has part VIDO:virus replication VIDO:viral disease course has part GO:virus synthesis stage Allowing the following connections, imported from CIDO: CIDO:coronavirus disease course has part CIDO:coronavirus synthesis stage CIDO:coronavirus synthesis stage subclass-of GO:virus synthesis stage CIDO:coronavirus disease course has part CIDO:coronavirus replication CIDO:coronavirus replication subclass-of VIDO:virus replication And then IDO-COVID-19: IDO-COVID-19:SARS-CoV-2 subclass-of CIDO:coronavirus IDO-COVID-19:COVID-19 disease course has part IDO-COVID-19: SARS-CoV-2 synthesis stage IDO-COVID-19:SARS-CoV-2 synthesis stage subclass-of CIDO:coronavirus synthesis stage IDO-COVID-19:COVID-19 disease course has part IDO-COVID-19: SARS-CoV-2 replication IDO-COVID-19:SARS-CoV-2 replication subclass-of CIDO:coronavirus replication The introduction in VIDO of the many stages of viral replication will assist researchers in classifying specific mechanisms of efficacy for given antivirals. To see how, consider IDO Core includes classes for cidal and static treatment agents, and these classes are extended in VIDO: viricidal disposition =def. Disposition to kill viruses. viricide =def. Cidal agent with a viricidal disposition that is realized in the process of killing viruses. virostatic disposition =def. Disposition to inhibit the replication of viruses. virostatic =def. Static agent bearing a virostatic disposition. Worth noting, VIDO provides terms needed to distinguish bacterial, fungal, and viral infectious agents, infectious agent dispositions realized in relevant infectious processes. This makes possible deploying the preceding strategy to the construction of IDO Bacteria, IDO Fungus, and IDO Parasite, and the linking of these reference ontologies to IDO Core. We can then use these reference ontologies to distinguish between viral, bacterial, fungal, and parasitic infections, between viral, bacterial, fungal, and parasitic infectious diseases, and so on, linking existing ontologies covering more specific bacteria, fungi, and parasites to IDO Core. Logical definitions can also be constructed for these various cidal and static terms, such as: viricide  cidal agent AND (has_disposition SOME viricidal disposition) viricidal disposition  cidal agent disposition AND (inheres_in SOME cidal agent) AND (realized_in ONLY (process AND results_in SOME (virus death temporal boundary))) virostatic  static agent AND (has_disposition SOME (virostatic disposition AND (realized in ONLY (negatively_regulates SOME virus replication)))) virostatic disposition  static agent disposition AND (inheres_in SOME static agent) AND (realized_in ONLY (process AND (negatively_regulates SOME virus replication))) 23 By introducing terms such as: virus death temporal boundary =def. Process boundary that marks the end of the life cycle of a virus. virus birth temporal boundary =def. Process boundary that marks the beginning of the life of a virus. virus replication =def. Replication process in which a virus containing some portion of genetic material inherited from a parent virus is replicated. The last relying on GO:reproduction, defined as the production of individuals containing some portion of genetic material inherited from one or more parent organisms. These terms allow for characterization of negative regulation of viruses via drugs targeting specific parts of the virus replication process in these ontologies, for example in CIDO: CIDO:negative regulation of coronavirus replication subclass-of VIDO:negative regulation of virus replication CIDO:negative regulation of coronavirus replication negatively regulates SOME CIDO:coronavirus replication CIDO:negative regulation of coronavirus synthesis negatively regulates SOME CIDO:coronavirus synthesis And in the case of IDO-COVID-19: IDO-COVID-19:negative regulation of SARS-CoV-2 replication subclass-of CIDO:negative regulation of coronavirus replication IDO-COVID-19:negative regulation of SARS-CoV-2 synthesis subclass-of CIDO:negative regulation of coronavirus synthesis IDO-COVID-19:negative regulation of coronavirus synthesis negatively regulates SOME IDOCOVID-19: SARS-CoV-2 synthesis Such classes provide resources needed to annotate and unify existing data concerning coronavirus antivirals in general, and COVID-19 antivirals in particular. Given the pressing need for progress in combatting the spread of these viruses in humans, consolidating and interpreting such data is of paramount importance. As is having terms needed for viral disease monitoring, such as the following terms in VIDO extending IDO:infectious disease epidemic and IDO:infectious disease pandemic, respectively: viral disease epidemic =def. Process of viral disease realizations and for which there is a statistically significant increase in the viral infectious disease incidence of a population. viral disease pandemic =def. Process in which multiple viral disease epidemics of the same type of viral disease unfold over overlapping periods of time and affect organism populations located in different geographic regions, including different countries and continents. Themselves easily extended to CIDO: coronavirus epidemic =def. Viral disease epidemic where the relevant pathogen is a coronavirus. coronavirus pandemic =def. Viral disease pandemic consisting of coronavirus epidemics. And from CIDO to IDO-COVID-19: 24 COVID-19 epidemic =def. Coronavirus epidemic where the relevant pathogen is SARS-CoV-2 and the relevant disease is COVID-19. COVID-19 pandemic =def. Coronavirus pandemic consisting of COVID-19 epidemics. We have then, by bridging IDO Core, VIDO, CIDO, and IDO-COVID-19, an ontology specific to SARS-CoV-2 and COVID-19 largely importing terms from existing ontologies, and sufficiently expressive to represent – among other relevant data – the rational drug and vaccine design applications to which CIDO has already been applied. As with VIDO, more detail and justification for IDO-COVID-19 can be found in [XX]. Having motivated important IDO Core terminological content, explored various welldesigned extensions, and illustrated three increasingly specific extensions of IDO Core that provide fine-grained representation of SARS-CoV-2 and COVID-19 data, we next illustrate how IDO Core and its extensions have been applied in coordination efforts. 4 IDO Applications and Limitations Ontology metadata can be used to combine heterogeneous bodies of research data to enable structured querying and analysis [70]. [71] applies these same methods to immunology data using IDO related ontologies such as OBI [16] and VO [68]. As has been revealed by COVID-19 pandemic, failure to pay heed to metadata standards limits the reusability of available primary genomic data, significantly impeding efficient response measures [72]. Now more than ever there is a need for stricter adherence to data curation and data management best practices [73]. Since IDO is built in accordance with the OBO Foundry principles, this means also that the IDO ontologies also have a degree of interoperability with other OBO Foundry ontologies. This makes IDO Core and its extensions applicable to the annotation of a variety of databases relevant to infectious disease that already make use of Foundry ontologies in their annotations [8]. An incomplete list of databases to which IDO ontology annotations are applied is provided in Table 3. In the ideal case data and information relevant to infectious disease research, independently of where they are stored, should be annotated using IDO terms. The resultant annotated data would thereby become available to computer processing as if they formed a single body of linked data in virtue of the semantically controlled properties of the IDO terms and of the logical structure of their definitions. Experience shows, however, that these benefits are difficult to achieve except in those cases where databases have been created using the ontology structure from the very beginning, an approach pursued most successfully in the case of the incorporation of Gene Ontology annotations into the UniProt database [74]. [75] provides an illustration of this approach in the field of influenza research. Matters are improving in this respect with the development of approaches to data annotation using machine learning. The results are still in many cases disappointing, but they are at least improving over time [76]. We believe that to accelerate these improvements it will be necessary to associate with each OBO Foundry ontology a terminology comprising, for example, (1) those terms in common usage in the relevant literature that denote entities which are denoted by different terms in the ontology, (2) terms denoting entities that are more specific than are covered in the salient ontology. This will then require a special set of relationships to indicate, for any given term in the terminology, the nature of the annotation with an ontology term. Where ontology precedes data, annotation then becomes automatic. All too often, however, problems arise, for example, because it is too difficult to associate terms from the controlled vocabulary with the terms used by those responsible for data collection. 25 Table 3  Some databases to which IDO annotations have been applied The Eukaryotic Pathogen Genomics Database (EuPathDB) [77] Provides genomic and other data for eukaryotic pathogens including Cryptosporidium, Giardia, Plasmodium, Theileria, Toxoplasma, and Trichomonas strains. Maintained by a team of researchers at the University of Pennsylvania led by Chris Stoeckert. VectorBase [78] Provides genomic and other data for a variety of invertabrae vectors of human pathogens. Also maintained by the Stoekert team. Eukaryotic Pathogen, Host & Vector Genomics Resource (VeuPathDB) With support from a recently awarded 5-year contract with the National Institute of Allergy and Infectious Diseases, worth up to $7.2 million in 2019-2020 [79], the Stoeckert team has integrated EuPathDB and VectorBase into one bioinformatics resource, VeuPathDB. IDO Core is playing a role in this project, as the VeuPathDB application ontology [80] imports several IDO Core terms such as: human pathogenicity disposition, infection, infection prevalence, and primary infection, each of which are used in the annotation of VeuPathDB datasets. Influenza Research Database [75] Resource to elucidate host-influenza virus interactions, leading to new treatments and preventive action. Contains "surveillance data, human clinical data associated with virus extracts, phenotypic characteristics of viruses isolated from extracts, and all genomic and proteomic data available in public repositories for influenza viruses" [81]. Virus Pathogen Resource [82] Database and analysis resource for human pathogenic viruses, including sequence, surveillance and host response data. PHIDIAS [83] PHIDIAS (Pathogen-Host Interaction Data Integration and Analysis System) is a web-based database system for searching, comparing, and analyzing integrated genome sequences, conserved domains, and gene expression data related to pathogen-host interactions [83]. Victors virulence factors database [83] A database to store and analyze virulence factors of a variety of pathogens that infect both humans and animals. Reflecting their support of knowledge re-use and automated reasoning, ontologies have been implemented in a variety of applications for the enhancement of patient diagnosis, care management and clinical decision support [84-86]. A brief overview and further references are provided in [87]. In the fields of infectious disease Decision Support Systems (DSSs) are commonly used in diagnostic assistance, guidance in the prescription of anti-infectives, biosurveillance, and vector control. Some examples are provided in Table 4. Table 4  IDO based DSSs Antibiotic decision support systems (ADSSs) Use of ADSSs has been shown to be effective in mitigating inappropriate antibiotic prescribing and lowering local antimicrobial resistance [88, 89]. To facilitate interoperability and widespread circulation of future ADSSs, a Bacterial Clinical Infectious Disease Ontology (BCIDO) has been developed from IDO Core [90]. IDDAP [91] IDDAP is a recently developed ontology-driven clinical decision support system for infectious disease diagnosis and antibiotic 26 prescription. IDDAP makes use of an infectious disease diagnosis ontology that builds upon IDO Core. Dengue Decision Support System (DDSS) [92, 93] The DDSS is an ontology driven computational application developed at Colorado State University to guide the implementation of locally appropriate Dengue and Dengue Vector control programs. The DDSS is used in conjunction with Chaak, a cell phone-based system for i) the field capture of data relating to Dengue vector surveillance; and ii) the rapid transfer of the data to the central DDSS database [94]. 4.1 DISCUSSION Confronting the COVID-19 pandemic is a joint effort involving the work of researchers in a variety of fields, including biologists, pathologists, epidemiologists, vaccinologists, physicians, pediatricians, social scientists, and many more. The IDO corpus provides a set of rigorously curated definitions of terms used in infectious disease research that can be employed to provide a useful vehicle for cross-disciplinary collaboration, allowing specialists in one sub-domain to rapidly gain an understanding of the meanings of the technical terms used in neighboring domains. Successful information-driven research on COVID-19 needs to be able to integrate the already massive and exponentially growing body of research and data concerning coronavirus diseases. Ontologies such as VIDO, CIDO and IDO-COVID-19 fill this need by providing a standardized, computer-interpretable representation of heterogenous coronavirus knowledge. Because these ontologies are built as part of an interoperable suite of IDO ontology modules, it becomes easier, for example, to compare COVID-19 to other respiratory diseases such as SARS, MERS, and influenza – along multiple dimensions including underlying disorders, pathogen features (such as strain, virulence factors, and drug resistance), host-pathogen interactions, routes of transmission, anatomical sites of infection, genetic and environmental variables, symptoms, diagnostic criteria, disease courses, prevention measures and so on. Admittedly, not all IDO extension ontologies have adequately adhered to the IDO strategy as presented in the foregoing. Part of the goal of our current work on VIDO and IDO-COVID-19 is to provide a model according to which other IDO extension ontologies can be brought into tighter coordination with the Core, as well as an easy to follow recipe for building new pathogenspecific ontologies so that infectious disease researchers are given fewer opportunities to generate inoperable ontologies. As we continue to face the threat of novel viruses (as well as bacteria and parasites) in the future, having such a blueprint in hand should facilitate more rapid extension of the IDO suite, thus allowing easy comparison, along multiple dimensions, of novel pathogens and diseases with pathogens and diseases about which data have already been assembled. trained Another potential application of this work is to ontology-based deep learning, for instance as illustrated in [95], which describes a novel method for learning features of entities such as proteins and viruses from their associations to ontology classes, and describes how this method can be employed for fast identification of virus–host interactions that can shed light on potential treatments and drug discoveries. Finally, the IDO ontologies can contribute to addressing a further urgent problem faced not merely by COVID-19 research but by contemporary biomedical research in general. This is the problem of reproducibility. This problem applies not only to scientific findings which are the results of experimental studies, but also to findings deriving from the application of different types of diagnostic tests. For an experiment, or a test, to be reproducible, it is crucial that we have a clear understanding of how the experimental or test results were obtained. For this to be possible, 27 however, it is crucial that the constituent processes are described in a terminology that is widely used and whose terms are well defined. We believe that, when used in combination with the Ontology for Biomedical Investigations [16], IDO offers a promising strategy for the creation of comparable, integratable and discoverable provenance metadata for the data generated in infectious disease research. 4.2 Limitations and future work Governmental collection of anonymous, patient-specific data can often involve inaccuracies, impeding effective infectious disease surveillance. For example, when multiple health-care organizations are involved, there is a risk of data about the same patient being reported more than once, as happened in Belgium: both general practitioners and nursing/senior homes reported anonymous data to governmental agencies about test results and death, resulting in higher numbers of COVID-19 than was actually the case [73]. For this reason, it is important to have some way to keep track of individual patients – and the data associated with them – in electronic health records, while also preserving anonymity. This is not something an ontology can solve on its own. A potential solution to this problem involves the methodology of Referent Tracking (RT) [96, 97]. The RT strategy is to make unambiguous reference to entities in databases, such as patients, by associating each entity with its own individual unique identifier (IUI). Ontologies do have a subsidiary role to play, as ontology classes can be used to tag an entity's IUI with designations of that entity's relevant features. In the case of a patient infected with SARS-CoV-2, for example, one can tag the patient's IUI using classes from IDO-COVID-19: {IUI5967, has_part SOME SARS-CoV-2 infection}. While IDO and its extensions have a degree of applicability to databases already annotated with OBO Foundry ontology terms, there are some limitations. Terms in databases and literature may denote instances or types by using the exact same term that is used in an ontology to denote a perhaps related, but still different type. Furthermore, databases and literature may use terms that denote entities which in the ontology are denoted by different terms. Even more prevalent are terms in databases that denote entities more specific than those covered in an ontology. This requires a special set of relationships to indicate the nature of the annotation with the ontology term. The "ontology precedes data" methodology sketched in section 4 provides one potential solution to this problem, especially given the enormous success this sort of approach has achieved within the Gene Ontology. Future work will involve new applications of this approach. Relatedly, ontology annotations are all too often applied incorrectly. Many users of OBO Foundry ontologies do not seem to understand BFO, OGMS, or the principles upon which they are based. This is likely a sign that we in the OBO community need to do better to make sure these principles are well understood. Where the principles are more or less understood, it is likely we still need be more vigilant in making sure OBO Foundry users are actually complying with them. A better division of labor is probably also needed, with such applications being assigned to properly trained ontologists rather than researchers. Related future work includes the development of a 'gold standard' corpus of 400 articles from PubMed on various coronavirus research, annotated with ontology terms for use in machine learning for automating data integration, hypothesis generation concerning COVID-19, and predictions aimed at limiting present, and preventing future, outbreaks. Conclusions As we face the continued threat of novel pathogens in the future, IDO Core provides a simple recipe for building new pathogen-specific ontologies in a way that allows data about novel diseases 28 to be easily compared, along multiple dimensions, with data represented by existing disease ontologies. The IDO strategy, moreover, supports ontology coordination, providing a powerful method of data integration and sharing that will allow physicians, researchers, and public health organizations to respond rapidly and efficiently both to the current and future public health crises. Methods With respect to editing tool, IDO Core was updated using the Protégé ontology development tool [98], leveraging the enhanced expressivity of the Web Ontology Language (OWL). IDO Core, like other OBO Foundry ontologies, is not exhaustive, as development of the ontology is intended to maintain pace with growing research on infectious diseases. With respect to updating IDO Core based on the existing OBO library, a study of extension ontologies was conducted in the interest of identifying terms in extensions that would be better placed in IDO Core. From another direction, a study of developments in OBO Foundry ontologies was conducted in the interest of identifying terms better suited to more general ontologies. In the event terms were needed for IDO Core which were not suitable for introduction, because too general for the domain of infectious diseases, term requests were made to developers of relevant OBO Foundry ontologies. For example, various transmission classes were requested for and subsequently added to TRANS. Lastly, with respect to updating IDO Core based on the construction of novel reference ontologies, such as VIDO, collaborative study between IDO Core, VIDO, CIDO, and IDO-COVID-19 developers resulted in careful construction of relevant terms based on up-to-date empirical literature, researcher term use, and logical coherence. For example, adjustments were needed to IDO Core's definition of infectious agent due to reflection on viruses, resulting in the introduction of the class acellular structure as parent class to virus. In every case, terminological content for IDO Core was either imported from an existing OBO Foundry library ontology, defined based on imported terms, introduced as a primitive to IDO Core, or defined based on IDO Core primitive and/or imported terms. In accordance with OBO Foundry principles, priority was given to importing and defined terms, over introducing primitive terms to IDO Core. Before new primitives were deemed necessary, IDO Core developers canvased researchers developing nearby ontologies for insights, posed queries on issue trackers on relevant GitHub pages, and studied relevant infectious disease literature. Terms were then introduced, vetted by specialists where possible, then introduced to IDO Core after scrutiny. As with most OBO ontologies, IDO Core is an open project with its own GitHub repository [99], where the most recent published and developmental versions of the ontology are available for download. We encourage members of the ontology community, as well as infectious disease researchers, to submit term requests to our GitHub Issues tracker. The Issues tracker can also be used to report any errors or concerns related to the ontology. Before requesting a new term, please search online ontology repositories such as Ontobee and BioPortal to see if the needed term already exists. Further advice is available in the OBO tutorial for term requests [100]. Once a term request is received, it will be reviewed by the main IDO Core team to determine whether the term is most appropriate for IDO Core, one of its extensions, or another OBO ontology. If the term is within IDO Core's scope, then it will be added with a formal definition, written in conjunction with the term requestor to ensure biological accuracy as well as adherence to OBO Foundry best practices and consistency with IDO logical structure. We can assign a unique ID for the term so that it can be used for immediate annotation prior to the definition being finalized. Abbreviations 29 Apollo-SV: Apollo Structured Vocabulary; BCIDO: Bacterial Infectious Disease Ontology; BFO: Basic Formal Ontology; CIDO: Coronavirus Infectious Disease Ontology; ChEBI: Chemical Entities of Biological Interest; CL: Cell Ontology; DDSS: Dengue Decision Support System; DrugOn: Drug Ontology; GenEpio: Genomic Epidemiology Ontology; GO: Gene Ontology; IDOBRU: Brucellosis Ontology; IDO Core; Infectious Disease Ontology Core; IDODEN: Dengue Ontology; IDOFLU; IDOHIV: HIV Ontology; Influenza Ontology; IDOMAL: Malaria Ontology; IDOMEN: Meningitis Ontology; IDOPlant: Plant Disease Ontology; IDOSCHISTO: Schistosomiasis Ontology; IDOSA; Staphylococcus aureus Infectious Disease Ontology; IOA: Information Artifact Ontology; MIRO: Mosquito Insecticide Resistance Ontology; NARSA: Network on Antimicrobial Resistance in Staphylococcus aureus; NDF-RT: National Drug File Reference Terminology; NCBITaxon: NCBI organismal classification; OBI: Ontology for Biomedical Investigations; OBO: Open Biomedical Ontologies; OGMS: Ontology for General Medical Science; OWL: Web Ontology Language; RO: Relations Ontology; VEuPathDB: Eukaryotic Pathogen, Host & Vector Genomics Resource; VIDO: IDO Virus; VO: Vaccine Ontology; VSMO: Vector Surveillance and Management Ontology. Declarations Ethics approval and consent to participate Not applicable Consent for publication Not applicable Availability of Data and Materials The datasets generated and/or analysed during the current study are freely publicly available in the IDO Core GitHub repository [https://github.com/infectious-disease-ontology/infectious-diseaseontology] as well as online ontology repositories such as Ontobee. [http://www.ontobee.org/ontology/IDO] and BioPortal [http://www.ontobee.org/ontology/ IDO]. IDO extensions are also freely publicly available on Github, Ontobee and BioPortal. Competing interests The authors declare that they have no competing interests. Funding BS's contributions were supported by the NIH under NCATS 1UL1TR001412 (Buffalo Clinical and Translational Research Center). SB's and JB's contributions were supported by NIH / NLM T5 Biomedical Informatics and Data Science Research Training Programs (5T15LM012495-03). Author's contributions All authors read and extensively reviewed the manuscript. SB wrote the manuscript and conducted the research. JB vastly improved the structure of the paper, as well as the sections on CIDO and VIDO. BS and LGC were principal developers of IDO Core and IDOSA. BS is a principal developer of BFO, OGMS, and IDOPlant. JB, BS and SB are the principal developers of VIDO and IDO-COVID-19. Acknowledgements We would like to acknowledge the following for their contributions to the development of IDO Core: Alexander Diehl, Alan Ruttenberg, Albert Goldfain, Bjoern Peters, and Jie Zheng. This paper 30 has benefitted from feedback from Alexander Diehl, Chris Stoeckert, Yongqun (Oliver) He, Asiyah Yu Lin, and Werner Ceusters. References 1. Pesquita C, Ferreirra JD, Couto FM, Silva MJ. The epidemiology ontology: an ontology for the semantic annotation of epidemiological resources. J Biomed Semant. 2014; doi:10.1186/ 20411480-5-4. 2. Arp R, Smith B, Spear A. Building Ontologies with Basic Formal Ontology. Cambridge, MA: MIT Press; 2015. 3. Zeng ML, Hong Y, Clunis J, He S, Coladangelo LP. Implications of Knowledge Organization Systems for Health Information Exchange and Communication during the COVID-19 Pandemic. Data Information Management. 2020; 4(3). https://doi.org/10.2478/dim-2020-0009 4. The Gene Ontology Consortium. The Gene Ontology Resource: 20 years and still GOing. Nucleic Acids Res. 2019; 47:D330–D338. doi:10.1093/nar/gky1055. 5. Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, et al. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol. 2007; 25:1251–1255. doi:10.1038/nbt1346. 6. The Open Biomedical Ontologies Foundry. http://obofoundry.org/. Accessed 27 Apr 2020. 7. Smith B, Ceusters W, Klagges B, Köhler J, Kumar A, Lomax J, et al. Relations in biomedical ontologies. Genome Biol. 2005; 6:R46. doi:10.1186/gb-2005-6-5-r46. 8. Cowell LG, Smith B. Infectious Diseases Ontology. In: Sintchenko V, editor. Infectious Disease Informatics. New York, NY: Springer; 2010. p. 373-95. 9. Goldfain A, Smith B, Cowell LG. Dispositions and the infectious disease ontology. In: Galton A, Mizoguchi R, editors. Formal Ontology in Information Systems: Proceedings of the 6th International Conference (FOIS 2010). Amsterdam: IOS Press; 2010. p. 400-413. 10. Scheuermann RH, Ceusters W, Smith B. Toward an ontological treatment of disease and diagnosis. AMIA Summit on Translat Bioinform. 2009; p. 116-120. 11. ISO/IEC 21838-2. https://www.iso.org/standard/74572.html. Accessed 27 Apr 2020. 12. ISO Standards Maintenance Portal. https://standards.iso.org/iso-iec/21838/-2/ed-1/en/. Accessed 27 Apr 2020. 13. Rupnik M, Wilcox MH, Gerding DN. Clostridium difficile infection: New developments in epidemiology and pathogenesis. Nat Rev. Microbiol. 2009. 7(7): 526–36. doi:10.1038/nrmicro2164. 14. Bruchfeld J, Correia-Neves M, Källenius G. Tuberculosis and HIV Coinfection. Cold Spring Harb Perspect Med. 2015; 5(7): a017871. https://doi.org/10.1101/cshperspect.a017871. 15. Ceusters W, Smith B, Kumar A, Dhaen C. Mistakes in medical ontologies: where do they come from and how can they be detected? In: Pisanelli D, editor. Ontologies in medicine. Amsterdam: IOS Press; 2004a. p. 145–164. 16. Bandrowski A, Brinkman R, Brochhausen M, Brush MH, Bug B, Chibucos MC, et al. The Ontology for Biomedical Investigations. PLOS ONE. 2016; 11(4):e0154556. doi: 10.1371/journal.pone.0154556. 17. Federhen S. The NCBI Taxonomy Database. Nucleic Acids Res. 2012; 40:D136-D143. doi: 10.1093/nar/gkr1178. 18. Martin AM, Fraser TA, Lesku JA, Simpson K, Roberts GL, Garvey J, et al. The cascading pathogenic consequences of Sarcoptes scabiei infection that manifest in host disease. R Soc Open Sci. 2018; 5(4): 180018. doi: https://dx.doi.org/10.1098%2Frsos.180018. 31 19. McNicholl JM, Smith DK, Qari SH, Hodge T. Host Genes and HIV: The role of the chemokine receptor gene CCR5 and its allele (∆32 CCR5). Emerg Infect Dis. 1997; 3(3):261-271. doi:10.3201/eid0303.970302. 20. Pathogen Transmission Ontology. https://bioportal.bioontology.org/ontologies/PTRANS. Accessed 27 Apr 2020. 21. Goldfain A, Smith B, Cowell LG. Towards an ontological representation of resistance: the case of MRSA. J Biomed Inform. 2011; 44:35-41. doi: 10.1016/j.jbi.2010.02.008. 22. Lin Y, Xiang Z, He Y. Brucellosis ontology (IDOBRU) as an extension of the infectious disease ontology. J Biomed Semant. 2011; doi: 10.1186/2041-1480-2-9. 23. Hogan WR, Wagner MM, Brochhausen M, Levander J, Brown ST, Millet N. The Apollo Structured Vocabulary: an OWL2 ontology of phenomena in infectious disease epidemiology and population biology for use in epidemic simulation. J Biomed Semant. 2016; 7:50. Doi: 10.1186/s13326-016-0092-y. 24. Griffiths E, Dooley D, Graham M, Van Domselaar G, Brinkman FSL, Hsiao WWL. Context Is Everything: Harmonization of Critical Food Microbiology Descriptors and Metadata for Improved Food Safety and Surveillance. Front Microbiol. 2017; 8:1068. Doi: 10.3389/fmicb.2017.01068. 25. Ceusters W, Smith B. About: towards foundations for the information artifact ontology. In: Couto FM, Hasting J, editors. Proceedings of the 6th International Conference on Biomedical Ontology (ICBO 2015). CEUR-WS.org; 2015. P. 1-5. 26. Lozano-Fuentes S, Bandyopadhyay A, Cowell LG, Goldfain A, Eisen L. Ontology for vector surveillance and management. J Med Entom. 2013; 50:1-14. Doi: 10.1603/me12169. 27. Luciano J, Schriml L, Squires B, Scheuermann R. The Influenza Infectious Disease Ontology (IIDO). The 11th Annual Bio-Ontologies Meeting, ISMB. 2008, 20 July; Toronto, Canada. 28. Liu Y, Chan W, Wang Z, Hur J, Xie J, Yu H, et al. Ontological and bioinformatic analysis of anti-coronavirus drugs and their implication for drug repurposing against COVID-19. Preprints. 2020; Available at: https://doi.org/10.20944/preprints202003.0413.v1. Accessed 27 Apr 2020. 29. Ong E, Wong M, Huffman A, He Y. COVID-19 coronavirus vaccine design using reverse vaccinology and machine learning. bioRxiv. 2020; Available at: https://doi.org/10.1101/2020.03.20.000141. Accessed 27 April 2020. 30. He Y, Yu H, Ong E, Wang Y, Liu Y, Huffman A, et al. CIDO, a community-based ontology for coronavirus disease knowledge and data integration, sharing, and analysis. Sci Data. 2020; 7:181. [https://doi.org/10.1038/s41597-020-0523-6] 31. Coronavirus Infectious Disease Ontology. https://bioportal.bioontology.org/ontologies/CIDO. Accessed 27 Apr 2020. 32. Influenza Ontology. https://bioportal.bioontology.org/ontologies/FLU. Accessed 27 Apr 2020. 33. Lin Y, Xiang Z, He Y. Ontology-based representation and analysis of host-Brucella interactions. J Biomed Semant. 2015; doi: 10.1186/s13326-015-0036-y. 34. Brucellosis Ontology. https:// bioportal.bioontology.org/ontologies/IDOBRU. Accessed 27 Apr 2020. 35. Virus Infectious Disease Ontology. https://bioportal.bioontology.org/ontologies/VIDO. Accessed 15 Jun 2020. 36. COVID-19 Infectious Disease Ontology. https://bioportal.bioontology.org/ontologies/IDOCOVID-19. Accessed 15 Jun 2020. 37. Mitraka E, Topalis P, Dritsou V, Dialynas E, Louis C. Describing the breakbone fever: IDODEN, an ontology for dengue fever. PLOS Negl Trop Dis. 2015; 9(2): e0003479. Doi: 10.1371/journal.pntd.0003479. 32 38. Dengue Ontology. https://bioportal.bioontology.org/ontologies/IDODEN. Accessed 27 Apr 2020. 39. Topalis P, Mitraka E, Bujila I, Deligianni E, Dialynas E, Siden-Kiamos I, et al. IDOMAL: an ontology for malaria. Malar J. 2010; 9:230. Doi: 10.1186/1475-2875-9-230. 40. Topalis P, Mitraka E, Dritsou V, Dialynas E, Louis C. IDOMAL: the malaria ontology revisited. J Biomed Semant. 2013; doi: 10.1186/2041-1480-4-16. 41. Malaria Ontology. https:// github.com/VeuPathDB-ontology/IDOMAL. Accessed 27 Apr 2020. 42. Béré C, Camara G, Malo S, Lo M, Ouaro S. IDOMEN: an extension of infectious disease ontology for MENingitis. In: Ohno-Machado L, Séroussi B, editors. MEDINFO 2019: Health and Wellbeing e-Networks for All. Amsterdam: IOS Press; 2019. P. 313-317. 43. Meningitis Ontology. https://github.com/cedricbere/IDOMEN. Accessed 27 Apr 2020. 44. Walls RL, Smith B, Elser J, Goldfain A, Stevenson DW, Jaiswal P. A plant disease extension of the infectious disease ontology. In: Cornet R, Stevens R, editors. Proceedings of the 3rd International Conference on Biomedical Ontology. CEURS-WS.org; 2012. P. 1-5. 45. Plant Disease Ontology. http://purl.obolibrary.org/obo/idoplant.owl. Accessed 27 Apr 2020. 46. Goldfain A, Smith B, Cowell LG. Constructing a lattice of infectious disease ontologies from a staphylococcus aureus isolate repository. In: Cornet R, Stevens R, editors. Proceedings of the 3rd International Conference on Biomedical Ontology (ICBO 2012). CEURS-WS.org; 2012. P. 1-5. 47. Staphylococcus aureus Infectious Disease Ontology. https://github.com/awqbi/ido-staph. Accessed 27 Apr 2020. 48. Camara G, Desprès S, Lo M. IDOSCHISTO: une extension de l'ontologie noyau des maladies infectieuses (IDO-Core) pour la schistosomiases. In: Faron-Zucker C, editor. IC – 25èmes Journées francophones d'Ingénierie des Connaissances, Clermont-Ferrand, France. Session 1: Construction, peuplement et exploitation d'ontologies. 2014. P. 39-50. 49. Schistosomiasis Ontology. https://github.com/gaoussoucamara/idoschisto. Accessed 27 Apr 2020. 50. Sargeant D, Deverasetty S, Strong CL, Alaniz IJ, Bartlett A, Brandon NR, et al. The HIVToolbox 2 Web System Integrates Sequence, Structure, Function and Mutation Analysis. PLOS ONE. 2014; 9:e98810. Doi: 10.1371/journal.pone.0098810. 51. HIV Ontology. https:// bioportal.bioontology.org/ontologies/HIV. Accessed 27 Apr 2020. 52. Xiang Z, Zheng W, He Y. BBP: Brucella genome annotation with literature mining and curation. BMC Bioinform. 2006; doi: 10.1186/1471-2105-7-347. 53. Vaccine Investigation and Online Information Network. http://www.violinet.org/. Accessed 27 Apr 2020. 54. Cooper L, Meier A, Laporte MA, Elser J, Mungall CJ, Sinn BT, et al. The Planteome database: an integrated resource for reference ontologies, plant genomics and phenomics. Nucleic Acids Res. 2018; 46:D1168–D1180. Doi: 10.1093/nar/gkx1152. 55. Natale DA, Arighi CN, Blake JA, Bona J, Chen C, Chen S, et al. Protein Ontology (PRO): enhancing and scaling up the representation of protein entities. Nucleic Acids Res. 2017; 45: D339D346. Doi: 10.1093/nar/gkw1075. 56. Eilbeck K, Lewis SE, Mungall CJ, Yandell M, Stein L, Durbin R, et al. The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol. 2005; 6(5):R44. Doi: 10.1186/gb2005-6-5-r44. 57. Haendel MA, Balhoff JP, Bastian FB, et al. Unification of multi-species vertebrate anatomy ontologies for comparative biology in Uberon. J Biomed Semant. 2014; doi:10.1186/2041-1480-521. 33 58. Degtyarenko K, Matos P, Ennis M, et al. ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Res. 2008; 36:D344-D350. Doi: 10.1093/nar/ gkm791. 59. Network on Antimicrobial Resistance in Staphylococcus aureus. http://www.narsa.net/. Accessed 27 Apr 2020. 60. Holland T, Baddour LM, Bayer AS, Hoen B, Miro JM, Fowler VG. Infective Endocarditis. Nat Rev Dis Primers. 2016; 2: 16059. Doi: 10.1038/nrdp.2016.59. 61. Gruninger M. Modular First-Order Ontologies via Repositories. Applied Ontology. 2012; 7(2). 62. Kibbe WA, Arze C, Felix V, Mitraka E, Bolton E, Fu G, et al. Disease ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data, Nucleic Acids Res. 2015; 43:D1071–D1078. Doi: 10.1093/nar/gku1011. 63. WHO COVID-19 Rapid Version CRF. https:// bioportal.bioontology.org/ ontologies/ COVIDCRFRAPID. Accessed 27 Apr 2020. 64. COVID-19 Surveillance Ontology. https:// bioportal.bioontology.org/ontologies/COVID19. Accessed 27 Apr 2020. 65. Linked COVID-19 Data Ontology. https://github.com/Research-Squirrel-Engineers/COVID19. Accessed 27 Apr 2020. 66. COVID-19 Research Knowledge Graph. https://github.com/nasa-jpl-cord-19/covid19knowledge-graph. Accessed 27 Apr 2020. 67. https://proconsortium.org/download/development/pro_sars2.obo; https:// proconsortium.org/download/ development/pro_sars2.gpi. Accessed 27 Apr 2020. 68. He Y, Cowell LG, Diehl AD, Mobley H, Peters B, Ruttenberg A, et al. VO: Vaccine Ontology. In: Smith B, editor. Proceedings of the 1st International Conference on Biomedical Ontology (ICBO 2009). Buffalo: NCOR; 2009. P. 172.69. Hogan WR, Hanna J, Joseph E, Brochhausen M. Towards a Consistent and Scientifically Accurate Drug Ontology. In: Dumontier M, Hoehndorf R, Baker CJO, editors. Proceedings of the 4th International Conference on Biomedical Ontology (ICBO 2013). CEURWS.org; 2013. P. 68-73. 70. Scheuermann R, Kong M, Dahlke C, Cai J, Lee J, Qian Y, et al. Ontology-based knowledge representation of experiment metadata in biological data mining. In: Chen J, Lonardi S, editors. Biological Data Mining. Boca Raton, FL: Chapman & Hall; 2009. P. 529-559. 71. Bhattacharya S, Dunn P, Thomas CG, Smith B, Schaefer H, Chen Jieming, et al. ImmPort, toward repurposing of open access immunological assay data for translational and clinical research. Sci Data. 2018; 5:180015. Doi: 10.1038/sdata.2018.15. 72. Schriml L, Chuvochina M, Davies N, Eloe-Fadrosh, E, Finn R, Hugenholtz P, et al. COVID-19 pandemic reveals the peril of ignoring metadata standards. Sci Data. 2020; 7:188. https://doi.org/10.1038/s41597-020-0524-5. 73. Sven Van P, Tvenstrup B, Vander Laenen M, Ceusters W. Ontology preludes data science: a COVID-19 use case. TowardsAI.net. June 9 2020; Accessed 18 Aug 2020. 74. Huntley RP, Sawford T, Mutowo-Meullenet P, Shypitsyna A, Bonilla C, Martin MJ, et al. The GOA database: Gene Ontology annotation updates for 2015. Nucleic Acids Res. 2015; 43: D1057D1063. Doi: 10.1093/nar/gku1113 75. Squires RB, Noronha J, Hunt V, García‐Sastre A, Macken C, Baumgarth N, et al. Influenza Research Database: An integrated bioinformatics resource for influenza virus research. Influenza Other Respir Viruses. 2012; 6(6), 404-416. Doi: 10.1111/j.1750-2659.2011.00331.x. 76. Kulmanov M, Smaili FZ, Gao X, Hoehndorf R, Machine learning with biomedical ontologies. bioRxiv. 2020; Available at: https://doi.org/10.1101/2020.05.07.082164. Accessed 15 Jul 2020. 77. Eukaryotic Pathogen Genomics Database. https://eupathdb.org/eupathdb/. Accessed 27 Apr 2020. 34 78. VectorBase: Bioinformatics Resource for Invertebrate Vectors of Human Pathogens. http://vectorbase.org. Accessed 27 Apr 2020. 79. https://eurekalert.org/pub_releases/2019-10/uop-nat101519.php. Accessed 27 Apr 2020. 80. Zheng J, Cade JS, Brunk B, Roos DS, Stoeckert CJ, Sullivan SA, et al. Malaria study data integration and information retrieval based on OBO Foundry ontologies. In: Jaiswal P, Hoehndorf R, editors. Proceedings of the Joint International Conference on Biological Ontology and BioCreative (ICBO-BioCreative 2016). CEUR-WS.org; 2016. p. 38. 81. Virus Pathogen Resource. http://www.viprbrc.org. 82. https://bioportal.bioontology.org/projects/IRD 83. Sayers S, Li L, Ong E, Deng S, Fu G, Lin Y, et al. Victors: a web-based knowledge base of virulence factors in human and animal pathogens. Nucleic Acid Res. 2019; 47:D693-D700. doi: 10.1093/nar/gky999. 84. Zhang YF, Gou L, Zhou TS, Lin DN, Zheng J, Li Y, et al. An ontology-based approach to patient follow-up assessment for continuous and personalized chronic disease management. J Biomed Inform. 2017; 72:45–59. doi: 10.1016/j.jbi.2017.06.021. 85. Abidi S. A knowledge-modeling approach to integrate multiple clinical practice guidelines to provide evidence-based clinical decision support for managing comorbid conditions. J Med Syst. 2017; 41(12):193. doi: 10.1007/s10916-017-0841-1. 86. Lin Y, Staes CJ, Shields DE, Kandula V, Welch BM, Kawamoto K. Design, development, and initial evaluation of a terminology for clinical decision support and electronic clinical quality measurement. AMIA Annu Symp Proc. 2015; p. 843–51. 87. Haendel MA, McMurry JA, Relevo R, Mungall CJ, Robinson PN, Chute CG. A Census of Disease Ontologies. Annu Rev of Biomed Data Sci. 2018; 1(1):305-331. 88. Thursky KA, Mahemoff M. User-centered design techniques for a computerized antibiotic decision support system in an intensive care unit. Int J Med Inform. 2007; 76:760-8. 89. Paterson DL. The role of antimicrobial management programs in optimizing antibiotic prescribing within hospitals. Clin Infect Dis. 2006; 42 Suppl 2: S90-5. doi: 10.1086/499407. 90. Gordon CL, Pouch S, Cowell LG, Boland MR, Platt HL, Goldfain A, et al. Design and evaluation of a bacterial clinical infectious diseases ontology. AMIA Annu Symp Proc. 2013; p. 502– 511. 91. Shen Y, Yuan K, Chen D, Colloc J, Yang M, Li Y, et al. An ontology-driven clinical decision support system (IDDAP) for infectious disease diagnosis and antibiotic prescription. Artif Intell Med. 2018; 86:20–32. 92. Eisen L, Coleman M, Lozano–Fuentes S, McEachen N, Orlans M, Coleman M. Multi-disease data management system platform for vector-borne diseases. PLOS Negl Trop Dis. 2011; 5:e1016. doi: 10.1371/journal.pntd.0001016. 93. Lozano-Fuentes S, Barker CM, Coleman M, Park BB, Reisen WK, Eisen L. Emerging information technologies to provide improved decision support for surveillance, prevention, and control of vector-borne diseases. In: Jao C, editor. Efficient Decision Support Systems: Practice and Challenges in Biomedical Related Domain. Rijeka, Croatia: InTech-Open Access Publisher; 2011. p. 89-114. 94. Lozano-Fuentes S, Wedyan F, Hernandez-Garcia E, Devadatta S, Ghosh S, Bieman JM, et al. Cell phone-based system (Chaak) for surveillance of immatures of dengue virus mosquito vectors. J Med Entomol. 2013; 50(4):879-89. doi: 10.1603/me13008. 95. Liu-Wei W, Kafkas Ş, Chen J, Tegnér J, Hoehndorf R. Prediction of novel virus–host interactions by integrating clinical symptoms and protein sequences. bioRxiv. 2020; Available at: https://doi.org/10.1101/2020.04.22.055095. Accessed 27 Apr 2020. 35 96. Ceusters W, Smith B. Tracking referents in electronic health records. Studies Health Technol Inform. 2005; 116: 71-76. https://doi.org/10.1016/j.jbi.2005.08.002. 97. Ceusters W, Smith B. Strategies for referent tracking in electronic health records. J Biomed Inform. 2006; 39(3): 362-378. https://doi.org/10.1016/j.jbi.2005.08.002. 98. Protégé. http://protege.stanford.edu. Accessed 27 Apr 2020. 99. https://github.com/infectious-disease-ontology/infectious-disease-ontology. Accessed 27 Apr 2020. 100. https://github.com/jamesaoverton/obo-tutorial/blob/master/docs/ontologydevelopment.md#filing-effective-term-requests-and-bug-reports. Accessed 27 Apr 2020. Additional Files Additional File 1: Supplementary Tables (.docx) Table S1  Ontologies building on the OGMS treatment of disease and diagnosis; Table S2  Overview of IDO extension ontologies that have been developed or planned; Table S3  Some other ontologies within the infectious disease domain that make use of IDO Core. Additional File 2: The Infectious Disease Ontology Extensions: Some Issues. (.docx) Several of the IDO ontologies require significant reengineering if they are to be considered bona fide extensions of IDO Core. This supplementary document provides an overview of some issues concerning specific IDO extensions, while providing some suggestions for how they can be addressed.