The Infectious Disease Ontology in 2020

The Infectious Disease Ontology (IDO) is a suite of interoperable ontology modules that aims to provide coverage of all aspects of the infectious disease domain, including biomedical research, clinical care, and public health. IDO Core is designed to be a disease and pathogen neutral ontology, covering just those types of entities and relations that are relevant to infectious diseases generally. IDO Core is then extended by a collection of ontology modules focusing on specific diseases and pathogens. In this paper we present applications of IDO Core within various areas of infectious disease research, together with an overview of all IDO extension ontologies and the methodology on the basis of which they are built. We also survey recent developments involving IDO, including the creation of IDO Virus, the Coronaviruses Infectious Disease Ontology (CIDO) and an extension of CIDO focused on COVID-19, IDO-CovID-19. We also discuss how these ontologies might assist in information-driven efforts to deal with the ongoing COVID-19 pandemic to accelerate data discovery in the early stages of future pandemics and to promote reproducibility of infectious disease research.


Data Silos and the OBO Foundry
Efforts by physicians, researchers, and public health organizations to respond to infectious diseases require the use of multiple, constantly changing data sources. Consider, for instance, a research team trying to build an effective large-scale epidemiological system for modeling a given population's herd immunity to measles. This depends on the integration of data not merely from biology and medicine, but also from statistics, sociology, and geography. The system will need to incorporate society-wide data on measles occurrence rates, transmission mode, birth rates, vaccination rates, family structures, age distribution and other relevant demographic factors [1,2], and also patientspecific data on clinical manifestations of disease, diagnoses and treatments received. Because the relevant information is collected using discipline-and community-specific methodologies and is stored in heterogeneous and geographically distributed databases, the data is typically only locally accessible. The resultant silo-formation hinders both translational and comparative research and preventive and prognostic public health research.
As the experience of biologists and bioinformaticians has shown, ontologies are an effective data sharing tool [3]. But in order for ontologies to be effective in this way, it is important that they are designed in a coordinated fashion -otherwise ontologies themselves will give rise to the creation of a new kind of silo. One of the most successful and widely adopted approaches to coordinated ontology development is that of the Open Biomedical Ontologies (OBO) Foundry, 1 a collective of developer groups dedicated to creating, testing and maintaining a collection of ontologies based on an evolving set of design principles for ontology development [4], including: • Ontologies should use a well-specified syntax and share a common space of identifiers.
• Ontologies should be openly available in the public domain for reuse.
• Ontologies in neighboring domains should be developed in a collaborative effort.
• Ontologies should be developed in a modular fashion.
• Ontologies should have a clearly specified scope.
• Ontologies should use common unambiguously defined relations between their terms. • Ontologies should conform to a common top-level architecture.
These principles were modelled initially on the practices of the Gene Ontology (GO) [3], which is not only enormously successful in its own right but has also served as the model for a series of life science ontologies following in its wake.
Wherever possible ontologies are created utilizing terms and relational expressions taken from existing Foundry ontologies and from the OBO Foundry Relation Ontology (RO) [5]. This practice --which applies also to the definitions in these ontologies --helps to ensure cross-linkage between ontologies in neighboring domains and also to prevent redundant efforts. It also puts ontologies in a position where they can support integration of data across a wide range while avoiding the generation of silos. Ontologies aligning with OBO Foundry principles also require each class have a unique identifier with the bipartite form of ID-space:Local-ID, as in GO:0008150. Use of a common ID-space means that the source of each term can be immediately identified by its prefix. Use of a common ID-space also ensures the ontologies retain backward compatibility with legacy annotations as those ontologies evolve. Moreover, ontology construction and extension -in accordance with these principles -follows a 'hub and spoke' model, where a core or 'hub' ontology provides the basis for extension ontologies or 'spokes' by specialization. Following this model, the Infectious Disease Ontology (IDO) represents one step towards overcoming data silos [6].

The Infectious Disease Ontology
IDO Core is designed to be disease and pathogen neutral by covering just those entities that are relevant to infectious diseases generally, providing coverage for aspects of infectious disease across biological scale (e.g. gene, cell, organ, organism, population), disciplinary perspective (e.g. clinical, biological, epidemiological), relevant organism type (e.g. host, pathogen, vector, reservoir) [7]. IDO Core thereby provides a 'hub' for a collection of ontology 'spokes' focusing on specific diseases and pathogens. Taken together, these form the IDO suite of ontologies, with more narrowly focused ontologies extending from the core, providing coverage of all aspects of the infectious disease domain, including biomedical research, clinical care, and public health [6].
Since IDO is built in accordance with the OBO Foundry principles described above, this puts IDO Core and its extensions in a position where they can promote interoperability with other ontologies built from Foundry ontologies. This makes IDO Core and its extensions applicable to the annotation of a variety of databases relevant to infectious disease that already make use of Foundry ontologies in their annotations [6]. 2 Relevant data and information within multiple disparate sources is annotated using the same set of IDO terms, which all use the same ID-space, namely "IDO". Specific extension ontologies are demarcated via unique ID-blocks pre-assigned by the IDO Core team. The resultant annotated data thereby becomes available to computer processing as if it were a single body of linked data in virtue of the semantically controlled properties of these terms. These strategies have the benefits of preventing duplicate terms and efforts, enforcing the use of the same ontology development best practices, and encouraging tighter coordination between the IDO Core team and the teams responsible for each specific extension ontology [8], all of which are important for overcoming data silo problems in the domain of infectious diseases.
In the ideal case, all IDO extension ontologies would be developed in the same way, and in conformance to all Foundry principles. Unfortunately, not all of these principles have been followed faithfully in the IDO extension ontologies developed thus far. We believe, however, that the work described below on coronavirus-related extensions will serve as a model for the re-engineering of existing IDO extensions in such a way as to yield greater conformance.

Foundations
At the heart of the IDO ontology ecosystem is the term 'disease', which is imported from the Ontology for General Medical Science (OGMS) [9]. The latter has as its coverage domain those types of entities relevant to clinical encounters between doctors and patients. Thus it includes representations of disease, causes and manifestations of disease, diagnosis, symptom, treatment, patient examination, history taking, laboratory test, and so forth. While OGMS takes clinical encounters involving humans as its starting point, many of its terms can be applied to non-human organisms. OGMS is itself an extension of the Basic Formal Ontology (BFO), a top-level ontology comprised of highly general classes such as 'object', 'material entity', and 'process', and used by more than 300 ontology projects as their top-level architecture [10], and is the official top-level ontology of the OBO (Open Biomedical Ontologies) Foundry. BFO has recently been published as international standard ISO/IEC 21838-2. 3 Developers of OGMS view the traditional practice of classifying diseases according to patterns of similarities in signs and symptoms (or, more generally, of phenotypes) as inadequate. A single disease may manifest a variety of symptoms making it difficult to distinguish from other diseases, for example Celiac disease shares many symptoms in common with, say, Chron's disease, or perhaps none at all, for example Clostridioides difficile infection is often asymptomatic. Moreover, the traditional practice fails to address the increasing importance played by genetic and environmental variables in disease taxonomy. Seeking to address these issues, OGMS characterizes diseases in BFO terms as dispositions of patients to undergo pathological processes. Distinguishing manifestations of symptoms from dispositions to manifest symptoms provides the flexibility needed to represent, say, asymptomatic patients nevertheless disposed to manifest symptoms, and make sense of certain physician prescriptions such as, say, that patients on antibiotics continue treatment after symptoms have subsided. In addition to distinguishing between disease, signs and symptoms of disease, and pathological processes, OGMS further distinguishes between: Disease diagnosis A disease's realization in a totality of processes, called a disease course Underlying disorder(s) within the patient in which the disease is rooted.
Each distinction provides needed flexibility in characterizing disease-related phenomena [9]. Different clinicians may diagnose the same disease differently; a disease may exist without having been yet diagnosed. Moreover, the same disease may have distinct disease courses and perhaps distinct underlying disorders in distinct patients. Indeed, the same disease may be manifested in a wide variety of different types of disease courses and clinical pictures depending upon the particular patient. And since disorders can exist before they are realized in overt pathological processes, the OGMS approach can capture the existence of pre-clinical manifestations of disease, and clinical risk factor combinations of disease and predispositions to disease which can exist within a single patient (as when an instance of disease of type A in a given patient is a risk factor for a second disease of type B). Conflating these distinctions -as seen in even widely-used disease vocabulary resources such as SNOMED-CT [11][12][13] -makes it more difficult to coherently count disease instances, which in turn may lead to incoherent reasoning about diseases, inconsistent models of specific diseases, errors in patient record, and failures to accurately measure progress in tackling disease spread.

Extending IDO Core from OGMS
IDO Core extends OGMS, and in doing so distinguishes between: • infectious disease • sign and/or symptom of infectious disease The OGMS representation of the relationships between disease, disorder and disease courses is inherited by IDO Core as follows: 4

Figure 1
The relevant OGMS terms are defined as follows: Disorder =def A material entity which is clinically abnormal and part of an extended organism; disorders are the physical basis of disease Disease =def A disposition to (i) undergo pathological processes that (ii) exists in an organism because of one or more disorders in that organism Disease course =def A process that is the totality of all processes through which a given disease instance is realized The relevant IDO terms descending from these OGMS terms are defined as follows: Infectious disorder =def An infection that is clinically abnormal Infectious disease =def A disease whose physical basis is an infectious disorder Infectious disease course =def A disease course that is a realization of an infectious disease In addition, IDO Core defines: Infection 5 =def A part of an extended organism that itself has an infectious agent population as part and that: (1) Exists as a result of processes initiated by members of the infectious agent population and is, (2) Clinically abnormal in virtue of the presence of this infectious agent population or, (3) Has a disposition to bring clinical abnormality to immunocompetent organisms of the same Species as the host (the organism corresponding to the extended organism) through transmission of a member or offspring of a member of the infectious agent population.
To unpack this definition, we first introduce further needed definitions used by IDO Core. For example, IDO Core imports the class organism from the Ontology for Biomedical Investigations (OBI) [14], as well as classes for a variety of organisms that can play the role of infectious agent, such as bacteria and virus, both imported from the NCBI organismal classification (NCBITaxon). 6 Additionally, the class extended organism is imported from OGMS, and is defined as an object aggregate consisting of an organism and all material entities located within that organism overlapping the organism or occupying sites formed in part by the organism [9]. To represent infectious agents and infectious agent populations, IDO Core introduces the following definitions: infectious agent =def An organism that has an infectious disposition 5 For lack of suitable pathogen specific extensions, the following subclasses of infection are currently included in IDO Core (these terms will be ported out once appropriate extension ontologies are developed): amebiasis =def An infection located in the colon and that has as part organisms of the Species Entamoeba histolytica candidiasis =def An infection that has as part organisms of the Genus Candida leptipspirosis =def An infection that has as part organisms of the Genus Leptospira Shigellosis =def An infection located in the bowel and that has as part organisms of the Genus Shigella trichomoniasis =def An infection that has as part organisms of the Species Trichomonas vaginalis 6 http://www.obofoundry.org/ontology/ncbitaxon.html infectious agent population =def An organism population whose members each have an infectious disposition.
As well as the following terms: infectious agent role =def A role borne by an infectious agent when contained in a host in which its infectious disposition can be realized. infectious disposition =def A pathogenic disposition that inheres in an organism and is a disposition for that organism to: (1) be transmitted to a host, (2) establish itself in the host, (3) initiate processes that result in a disorder in the host, and (4) become part of that disorder Furthermore, IDO Core defines various processes involved in the establishment of infections: colonization of host =def An establishment of localization in host process in which an organism establishes itself in a host establishment of a clinically abnormal colony =def A colonization of host process that results in a clinically abnormal colony process of establishing an infection =def A process by which an infectious agent, established in a host, becomes part of an infection in the host According to IDO Core, an infectious agent realizes its infectious disposition only when all of (1)-(4) occur. When an infectious agent colonizes and establishes a clinically abnormal colony in a host, it acquires an infectious agent role, and realizes aspects of its infectious disposition, such as becoming part of a disorder in the host (i.e. participates in a process of establishing an infection). But an infectious agent does not bear the infectious agent role when established within a host with which it is commensal. While the infectious agent realizes certain aspects of its infectious disposition (i.e. is transmitted to and colonizes the host), it doesn't establish a clinically abnormal colony. The definition of 'infectious disposition' makes reference to pathogenic dispositions. IDO Core defines these dispositions and related pathogen terms: pathogen =def A material entity 7 bearing a pathogenic role pathogen role =def A role borne by a pathogen in virtue of the fact that it or one of its products is sufficiently close to an organism towards which it has the pathogenic disposition to allow realization of the pathogenic disposition 8 pathogenic disposition =def A disposition to initiate processes that results in a disorder With these definitions introduced, the complexity of the definition of 'infection' can be unpacked, in particular, clauses (2) and (3). Clause (2) ensures that infections are clinically abnormal owing to relevant infectious agents. Suppose extended organism S has two distinct parts themselves with infectious agent population parts, A and B. Suppose A is dormant while B is active in S, so that B results in S manifesting symptoms while A does not. Clause (2) requires infection by B be tied to clinical abnormality stemming from B. Related, clause (3) allows infection by A to be represented since the relevant part of S which has infectious agent A as part at least has the disposition to initiate pathological processes in other members of S's species, given transmission of A or A's offspring to those members. Importantly, this definition of 'infection' allows for representation of commensal populations, infectious disorders that are caused by organisms that are typically commensal, and the fact that asymptomatic infected individuals are contagious (as we see now with SARS-CoV-2). There is currently no term for commensal populations in IDO Core, but the term should be added.

Transmission of Pathogens
IDO Core recognizes the importance of characterizing infectious disease transmission in its various forms, and relevant class definitions such as: transmission process =def A process that is the means during which the pathogen is transmitted directly or indirectly from its natural reservoir, a susceptible host or source to a new host host =def An organism bearing a host role host role =def A role borne by an organism in virtue of the fact that its extended organism contains a material entity other than the organism A variety of infectious diseases, including malaria and dengue fever, are vector borne. Thus, IDO Core contains terms such as: infectious agent vector =def An organism bearing an infectious agent vector role infectious agent vector role =def An infectious agent transporter role 9 that is borne by an organism active in the transfer of an infectious agent to an organism of another Species and in which the agent is infectious indirect pathogen transmission process 10  definitive host role =def A symbiont host role borne by an organism in virtue of the fact that its partner in symbiosis reaches developmental maturity or reproduces sexually in the host parasite host role =def A symbiont host role borne by an organism in virtue of the fact that its partner in symbiosis derives from the host a growth, survival, or fitness advantage while the host's growth, survival, or fitness is reduced paratenic host role =def A symbiont host role borne by an organism in virtue of the fact that its partner in symbiosis utilizes the host to undergo a developmental stage transition, but the host is not required for continuation of the partner's life cycle The preceding selection does not exhaust those host roles included in IDO Core but does reflect the wide range of ways in which to characterize host-symbiont relationships.

Pathogen Inhibition and Control
IDO Core provides several terms relevant to the inhibition and killing of pathogens: antibacterial =def A material entity bearing an antibacterial disposition antibacterial disposition =def A disposition to kill or inhibit the reproduction of bacteria antifungal =def a material entity bearing an antifungal disposition antifungal disposition =def A disposition to kill or inhibit the development or reproduction of fungal organisms antiparasitic =def A material entity bearing an antiparasitic disposition antiparasitic disposition =def A disposition to kill or inhibit the development or reproduction of eukaryotic parasites antiviral =def A material entity bearing an antiviral disposition antiviral disposition =def A disposition to kill or inhibit the lifecycle of viruses Relatedly, one of the most important applications of IDO is its treatment of the phenomenon of resistance. Examples of resistance include, a population's herd immunity to certain populations of infectious organisms, a tumor cell's resistance to chemotherapy, and the resistance of certain pathogens to antimicrobial drugs. The correct identification of different types of resistance is essential to both treatment decisions and public health policies [15]. For instance, varying strains of a given bacterial pathogen type (e.g. Staphylococcus aureus) can differ in terms of their degree of resistance and in the types of drug to which they are resistant. In the examples of resistance described above, resistance is a feature of an organism, or population of organisms, that serves to protect it/them from being damaged by some other entity. To capture this aspect of resistance, IDO Core contains the term protective resistance which is defined as follows: protective resistance =def A disposition that inheres in a material entity in virtue of the fact that the entity has a part (e.g. a gene product), which itself has a disposition to mitigate damage to that entity This term is defined to systematically account for all different kinds of resistance, not just drug resistance on the part of infectious agents or the resistance of hosts to infectious agents, but also things like the resistance of vectors to insecticide. Notice that what occurs in many cases where protective resistance is manifested is that another process is being prevented. Consider for instance the immunity of an individual X to a specific infectious organism Y that has the capability to cause damage to X. Given its infectious disposition, Y is disposed to be transmitted to and establish itself in X, initiate processes that result in a disorder in Y and become part of that disorder. X's immunity to Y is realized in certain processes that prevent certain of the aforementioned processes from occurring, thus mitigating the damage those process potentially may have caused to X. To capture this aspect of resistance, protective resistance has been characterized in terms of what can be called "blocking dispositions" [7,15], a disposition the manifestation of which prevents, or at least mitigates, the realization of another disposition. The disposition whose realization is prevented (or mitigated) can be called a "blocked disposition." Thus, since X's immunity to infectious organism Y is realized in processes that prevents certain realizations of Y's infectious disposition, the former is a blocking disposition for the latter (the latter being a blocked disposition).
This characterization of resistance is further enhanced in IDO Core by importing from RO the relation negatively_regulates, holding between processes x and y, and defined as: negatively_regulates =def The progression of x reduces the frequency, rate or extent of y Thus, we can say that X's immunity to Y is a blocking disposition for Y's infectious disposition insofar that X's immunity is realized in certain processes that negatively_regulates the manifestation of Y's infectious disposition. For instance, X's immunity may be realized in processes -such as, antibody secretion which would neutralize viral particles, preventing them from entering host cellsthe progression of which reduces the rate at which, or the extent to which, Y establishes itself in X.

Case Study: IDOSA and methicillin resistant Staphylococcus aureus
A particularly important use of protective resistance within IDO is that of modelling the resistance of certain bacteria to antibiotic drugs. For this purpose, the following subclasses of protective resistance are asserted in IDO Core: drug resistance =def A protective resistance that mitigates the damaging effects of a drug antibiotic resistance =def A drug resistance that mitigates the damaging effects of an antibiotic Beta-lactum antibiotics, such as methicillin, are the most widely used antibiotics, and most work by preventing bacterial cell wall construction. Beta-lactum antibiotics act by binding to and inhibiting the penicillin-binding-proteins (PBPs) within bacteria that facilitate the synthesis of peptidoglycan molecules, thus compromising the structural integrity of the cell wall. In response to the wide-spread use of Beta-lactum antibiotics, some bacteria have rapidly evolved novel-structured PBPs which lack an affinity for these antiobiotics, thus rendering these antibiotics less effective in preventing cell wall construction. At a certain level, the description of antibacterial resistance seems to require a negative aspect; thus, the appeal to the fact that the PBPs of resistant bacteria lack an affinity for Beta-lactum antibiotics. But negative characterizations of a phenomenon at one level of biological reality often belie its positive aspects at another level [15]. An important desideratum in the construction of realist ontologies is to avoid as far as possible the use of definitions involving negative differentia (e.g. 'non-contagious', 'not part of an infection'). In compliance with this "positivity design principle", an ontologically correct representation of resistance will reveal those active mechanisms that produce resistance-in the case of resistant bacteria, the active dispositions inhering in novelstructured PBPs that inhibit antibiotics from manifesting their damaging effects [15].
We can see these commitments on display more clearly by considering an important case of beta-lactum antibiotic resistance: methicillin resistant Staphylococcus aureus (MRSa). MRSa's resistance to methicillin is conferred by PBP2a, a PBP that lacks affinity for methicillin and is the product of the gene mecA. The need to provide a coherent and consistent understanding of the mechanisms underlying MRSa antibiotic resistance is one of the impetuses for the development of the Staphylococcus aureus Infectious Disease Ontology (IDOSA), an extension of IDO covering entities specific to Staph aureus (Sa) infection diseases [7,15,16].
IDOSA's main hierarchy is built on BFO version 2.0, is well-organized hierarchy, and imports IDO Core in full. IDOSA provides terms covering all entities relevant to antibacterial resistance in Sa, including terms for proteins, genes, gene products, biological processes and antibacterials, and imports terms from several relevant ontologies such as: • Terms for proteins imported from the Protein Ontology (PRO) 11 • Terms gene, gene group, and mobile genetic element from the Sequence Ontology (SO) 12 • The term mecA -a term representing the gene responsible for the production of PBP2a in MRSa -from the Vaccine Ontology (VO) [17] (though it would be more appropriate to use for example the NCBI Gene name/identifier) • Terms for biological processes from the GO Biological Process Ontology, such as peptidoglycan biosynthesis process and cytolysis in other organism (a subclass of killing of cells of other organisms) • The term gram-positive bacterium-type cell wall from the GO Cellular Component Ontology • Terms representing common anatomical sites of Sa infections, such as bone, lung, and endocardium, from the Uber-anatomy ontology (UBERON) [18] • Terms for several beta-lactum antiobotics, including ciprofloxacin, methicillin, and penicillin from Chemical Entities of Biological Interest (ChEBI) [19] • The term staphylococcus aureus from NCBITaxon IDOSA additionally introduces several domain-specific terms, such as ccr gene complex and mec gene complex, added as subclasses of gene group, antibacterial representing common treatments for Staphylococcus infections, for example ciprofloxacin, methicillin, and penicillin (all imported from CHEBI; all defined as having a relevant antibacterial disposition), SCCMec added as a subclass of mobile genetic element. SCCMec is the central determinant for broad-spectrum beta-lactum resistance encoded by mecA.
Of particular importance for us here, is that IDOSA adds the subclasses methicillin-resistant Staphylococcus aureus and methicillin susceptible Staphylococcus aureus, which are defined as follows: methicillin-resistant Staphylococcus aureus =def An organism of type Staphylococcus aureus that has resistance to betalactam antibiotics methicillin-susceptible Staphylococcus aureus =def An organism of type Staphylococcus aureus that lacks resistance to betalactam antibiotics Both subtypes are defined in terms of the IDOSA class resistance to beta-lactam antibiotic, which itself is a subclass of IDO:antibiotic resistance and defined as follows: resistance to beta-lactam antibiotic =def An antibiotic resistance that mitigates the damaging effects of a betalactam antibiotic With these terms and definitions, we can characterize both Methicillin-susceptible Staphylococcus aureus's (MSSa) susceptibility, and MRSa's resistance, to betalactum antibiotics in terms of protective resistance and blocking dispositions [7,15]. Consider, MSSa is susceptible to the damaging effects of methicillin because it lacks protective resistance to that drug. A positive (active) characterization of its susceptibility to the drug can be in terms of the disposition of its PBPs. The PBPs within MSSa have an affinity for methicillin, a disposition to bind to methicillin which is realized in a methicillin PBP binding process. This process negatively_regulates the synthesis of peptidoglycan, thereby interfering with the formation of a stable cell wall. In the case of MSSa, PBP's affinity for methicillin blocks the disposition the disposition to synthesize peptidoglycan, leaving MSSa susceptible to the damaging effects of the drug. Thus, PBP's affinity for methicillin is the active mechanism which underlies MSSa's lack of protective resistance to the drug. By contrast, MRSa bears a protective antibiotic resistance to the drug methicillin. PBP2a has the disposition to synthesize peptidoglycan, and thereby participate in the construction of a wellformed cell wall. Because the process of cell wall construction is incompatible with, and therefore negatively_regulates, the process of methicillin binding, PBP2a's disposition to synthesize peptidoglycan is a blocking disposition for the disposition of methicillin to bind to PBPs. Thus, MRSa's protective resistance can be seen as an active response in which one of its parts, PBP2a, manifests a disposition to mitigate the damaging effects of methicillin.
We can, moreover, provide formalized representation of MRSa's resistance to methicillin, following that in [15], as a set of triples representing the relevant ontological relationships (e.g., PBP has_function_realized_as_process synthesis of peptidoglycan), and a series of inference rules, which can be used along with derived facts in order to show that the resistance of MRSa can be inferred from the triples in a manner that is both logical and provides an explanation of why MRSa bears such a resistance. This is important since in the future such inference rules could potentially be used by an automated reasoner to deduce from the triples that MRSa is resistant to methicillin. In this way, an ontology containing the relevant triples and inference rules may guide treatment decisions and facilitate automated drug discovery [14].
Importantly, the account of protective resistance can be generalized to other cases [15], such as the resistance against HIV conferred by CCR5-32, and the resistance against malaria conferred by the sickle cell trait. CCR5-32 is a deletion mutation of the CCR5 gene resulting in cells which lack a functioning CCR5 receptor on their surfaces. In this case, the disposition of individuals with the CCR5-32 mutation to develop cells that lack CCR5 on their surface acts as blocking disposition for the disposition of HIV to bind to a CCR5 molecule. Plasmodium falciparum, one of the infectious organisms that cause malaria, have a disposition to spread through host red blood cells, a process that is reduced in dense, dehydrated red blood cells. In individuals with the sickle cell hemoglobin gene, red blood cells have a disposition to become dehydrated and thus increase in density. This disposition acts as a blocking disposition for the disposition of plasmodium to spread through red blood cells, a process requiring hydrated red blood cells.
As we have seen, IDO Core aligns with OBO Foundry principles, imports a range of terms from widely used ontologies, covers a wide range of phenomena in the domain of infectious diseases, and has been used effectively to characterize an important resistance phenomenon in IDOSA. In the next section, we examine several ontologies that purport to extend IDO Core in various ways.

EXTENDING IDO Core
An overview of the current state of each IDO extension ontology is provided in Table 1 (see also Appendix B). Those IDO extensions marked with an asterisk will require reengineering to address various issues -see the Supplementary Materials for details. In this section we highlight some of the positives. CIDO is discussed at the end of the paper.  [29,30]: Not yet formally released to BioPortal; most recent version 16 uploaded on October 23, 2013 Table 1 3.1 IDOBRU IDOBRU is maintained by Yongqun He's laboratory research team at the University of Michigan, and is used to facilitate the integration and exchange of brucellosis information stored in widely used databases, including: IDOBRU exhibits a well-organized hierarchy with BFO, OGMS and IDO Core imported in full, and is a good exemplar of the IDO Core hub and spokes model. As can be seen in Table 2 Table 2 In [8] IDOBRU classes are used to provide formalized representations of: • The mechanisms by which Brucella successfully establishes an infection in the host, including the crucial role played by Brucella virulence factors Brucella VirB1 protein and Brucella lipopolysaccharide (LPS) in the bacterium's ability to survive and replicate within the vacuolar macrophage compartments of host cells • Brucellosis diagnoses using a PCR assay to test a Brucella gene omp-2 encoding for an outer membrane protein from patient's blood sample • The intentional use of aerosolized Brucella as a weapon of bioterrorism, as well as the use of bleach for the purpose of disinfection in the case of a Brucella bioterrorist attack • WHO's recommended standard treatment for uncomplicated brucellosis cases in adults and children that are 8 years of age or older (using Brucella vaccine terms imported from VO) He's group has also used the ontology to provide a formal treatment of host-brucella interactions [23] and as the basis for an online IDOBRU SPARQL query interface. 17

IDOPlant
IDOPlant is currently being developed as part of the Planteome Project 18 , an international collaborative effort supported by primary funding 19 from the National Science Foundation of USA. The Planteome project maintains a large database of annotations from genomic and phenomic studies [32], and aims to "overcome the obstacles in annotating data for complex biological concepts that span multiple ontologies by developing both the ontology terms and the software tools needed to annotate data from all aspects of plant diseases" [28]. IDOPlant is currently being incorporated within the biotic plant stresses branch of the Plant Stress Ontology (PSO) 20 , a Planteome plant reference ontology providing coverage for all abiotic plant stresses, such as drought, salinity, temperature, and nitrogen deficiencies, and for all biotic plant stresses, such as pests, pathogens, symbiotic organisms, and diseases. The Planteome Project makes use of a variety of reference ontologies for plants, many of which are integrated with IDOPlant [28]. These include external ontologies: • The Gene Ontology (GO) [3], from which IDOPlant imports terms for the molecular functions of host and pathogen genes and those biological processes in which either the host, pathogen, or both are involved. • Chemical Entities of Biological Interest (ChEBI) • Phenotypic Quality Ontology (PATO) [33] As well as ontologies internally developed by the Planteome Project, including: • The Plant Ontology (PO), which provides a standardized, species-neutral terminology for plant anatomy, morphology and development stages [34,35]; imported PO terms are used to describe the plant structures at which infections happen and the development stages during which disease signs are observed. • The Plant Trait Ontology (TO) 21 ; IDOPlant imports TO terms describing phenotypic plant traits relevant to plant disease, including qualities such as leaf color, processes such as sudden wilting, and independent continuants such as leaf lesion.
Additionally, IDOPlant imports: • Terms for host plants, pathogens-such as Oryza sativa and Xanthomonas oryzae-from NCBITaxon • Terms relating to the ecology of host plants and diseases from the Environment Ontology (ENVO) [36,37] • And relevant terms from OBI Particularly crucial to plant disease research is the study of the plant structures and development stages at which infections happen and symptoms of disease are observed. As such, plant scientists need to be mindful of the occurrence of infections and plant disease symptoms at different stages across different species. Unfortunately, comparison of development stages across species is hindered by the fact that biologists often describe development stages in species-or cladespecific terms. As indicated above, IDOPlant leverages species-neutral terms from PO to integrate data about plant structure development stages across different species. IDOPlant thus overcomes traditional barriers to classification. The integration of PO plant structure development stages to plant pathology data within IDOPlant makes it "possible to explore and learn the molecular and environmental basis of plant diseases using advanced semantic methods" [38].
Integrating plant disease research in IDOPlant additionally leverages relevant terms from IDO Core. The following illustrate important relationships between these two ontologies: Reflecting the fact that plant diseases can be caused by either bacteria, fungi or viruses, in IDOPlant, infectious agent has the following subclasses each of which has a logical definition in OWL built from IDO:infectious disposition: • bacterial infectious agent  bacteria AND (has_disposition SOME infectious disposition) • fungal infectious agent  fungi AND (has_disposition SOME infectious disposition) • viral infectious agent  virus AND (has_disposition SOME infectious disposition) (As these subclasses are clearly relevant to infectious diseases generally, these terms will eventually be ported over to IDO Core.) Similarly, IDO:infectious disposition can be used in the definition of classes for specific infectious agents. For instance, IDOPlant might define • xanthomonas oryzae infectious agent  xanthomonas oryzae AND (has_disposition SOME infectious disposition) 22

IDOSCHISTO
IDOSCHISTO is an extension of IDO Core focusing on schistosomiasis, a waterborne infectious disease caused by Schistosoma helminth parasites. IDOSCHISTO is designed to support epidemiological monitoring systems and the running of qualitative simulation models. To facilitate this, the IDOSCHISTO team developed the Infectious Disease Spreading Domain Ontology (IDSDO Core) [29]. IDSDO provides the process class idsdo_spreading, and its children idsdo_endemic_spreading, idsdo_epidemic_spreading and idsdo_pandemic _spreading. The spreading of schistosomiasis within a given type of geographical location can be represented using the unfolds_in relation which links the class idsdo_spreading to the class geographical_location. Geographical_location has as subclasses the following defined classes: • intestinal schistosomiasis area  geographical location AND (adjacent_to SOME (location_of SOME bulinus)) • urinary schistosomiasis area  geographical location AND (adjacent_to SOME (location_of SOME biomphalaria)) IDOSCHISTO contains classes for a variety of important snail species that serve as intermediary hosts for the disease, such as bulin and biomphalaria. The relation has_intermediary_host was created for IDOSCHISTO in order to link snail species to the schistosoma species for which they are an intermediary host. Given the information encoded in the IDOSCHISTO concerning the role that the bulinus and biomphalaria snail species play in the transmission of the schistosoma pathogen, these classes can be used in combination with classes such as human distribution, snail distribution, and parasite distribution (which are subclasses of population distribution) to model risk factors for the spread of intestinal schistosomiasis and urinary schistosomiasis [30]. IDOSCHISTO has been used to annotate and query data from epidemiological investigations in Richard Toll, Senegal. IDOSCHISTO's individual hierarchy contains terms for districts, as well as nearby water bodies, with adjacency relations asserted between them. The located_in relation is asserted between water bodies and the snail species that inhabit them, while schistosoma species are linked to snail species via the has_ intermediary _host relation. For example, Ndiaw is a district of Richard Toll lying on the south bank of the Senegal River. Given the information encoded in IDOSCHISTO, a SPARQL query regarding which types of schistosomiasis the ndiaw population is exposed to will return the answers: intestinal_schistosomiasis and urinary_schistosomiasis [30].

IDOMEN
IDOMEN is designed to assist in the analysis and filtering of data collected on social media platforms such as Twitter to help improve the early detection of meningitis epidemic risks in sub-Saharan Africa [27]. Increasingly, digital technology platforms such as search queries, social media posts, and web server access logs have been employed for disease surveillance. Digital platforms provide real time data streams from which information related to public health can be extracted at low to virtually no cost and in a timely manner. By mining these data sources for traces of healthrelated activities, they can be transformed into useful metrics for inclusion in statistical estimation models for disease incidence. Google, Wikipedia and Twitter have each been investigated as tools for quantifying disease incidence rates [39][40][41]. Furthermore, ontology-based approaches to internetbased disease surveillance have been developed and applied with good success [42,43].

IDO NEXT STEPS
As with any ontology, IDO Core and its extensions require continual maintenance. In this section we detail some general issues to be addressed in future developments of the IDO suite.

Fungal Infections
While IDO Core includes the organism classes bacteria, virus, and parasite, a term for fungal organisms is currently not present. On the other hand, IDO Core does contain the terms antifungal and antifungal disposition, the definitions of which both refer to fungal organisms.

Pathogens and Pathogen Roles
IDO Core's pathogen class has the child infectious agent. The latter class has the children, emerging pathogen, infectious human pathogen, opportunistic pathogen, primary pathogen and re-emerging pathogen. Corresponding to the pathogen and infectious agent classes are the roles pathogen role and its single child infectious agent role. But infectious agent role itself has no children. Consideration might be given to including children of infectious agent role corresponding to the children of infectious agent.

The importation of disease terms from the Human Disease Ontology (DOID)
For both IDOMAL and IDOSA, new disease terms were created. But in line with OBO Foundry principles, they will be updated to import disease terms from DOID [44,45]. Similarly, disease terms that were newly created for IDOBRU, IDOMEN, IDOSCHISTO, and IDODEN should be replaced with existing DOID terms. FLU already imports DOID:influenza and CIDO imports DOID:COVID-19. Like IDO, DOID incorporates the OGMS definition of disease. 23 But while some DOID disease terms are defined partially in terms of underlying disorders (such as DOID:COVID-19), there isn't consistency on this front. A better template is provided by IDOSA: Staphylococcus aureus infectious disease, defined as "An infectious disease that has a staphylococcus aureus infectious disorder as its material basis." We will work with the DOID team to achieve consistency with the OGMS definition moving forward.

Treatment of symptoms in IDO Core and its extensions
Traditional disease classifications often place far too much emphasis on similarities in symptoms. Nevertheless, the study of symptoms is still an important factor in the infectious disease domain. IDO Core imports the term OGMS:symptom, defined as "A quality of a patient that is observed by a patient or a processual entity experienced by the patient, either of which is hypothesized by the patient to be a realization of a disease." Accordingly, the majority of the IDO extensions follow OGMS and IDO Core in asserting symptom as a direct child of entity (as symptoms fail to form a natural kind; see [9]). Symptoms, though, are represented in inconsistent ways across IDO extensions. In IDOBRU and CIDO, symptom is a child of quality. Whereas in IDOMAL and IDODEN symptom is a subclass of realizable entity.
Moving forward, closer collaboration is needed to ensure consistency here. We should emphasize that the OGMS community, including various IDO experts, is working closely to refine its treatment of symptoms. The following definition is currently being considered: • OGMS:symptom =def A process experienced by the patient, which can only be experienced by the patient, that is hypothesized to be clinically relevant Given this proposal, we would reassign symptom as a subclass of BFO:process within OGMS, IDO Core and its extensions. This would ensure that all subclasses of symptom appearing within individual IDO extensions are consistently classified as processes.
The treatment of symptoms in IDOPlant is a special case [28]. To represent plant disease symptoms within IDOPlant, a new term was created instead of importing OGMS:symptom. This is because the OGMS definition requires a sentient host that can report its experiences. In IDOPlant, plant disease symptom is defined as "A feature of a plant is of the type that can be hypothesized to be involved in the realization of a plant disease". This is in line with the usage of "symptom" in plant pathology for those phenotypes that are associated plant diseases, where the same phenotype may be linked to a multitude of diseases. The relation has_plant_disease_symptom was also created for the ontology in order to link plant diseases to those phenotypes, processes and independent continuants that are used in the diagnosis of plant disease. For instance, rice bacterial leaf blight disease has_plant_disease_symptom leaf lesion. The fact that IDOPlant:symptom encompasses qualities and independent continuants in addition to processes should not be an issue as the rationale behind the proposed change to OGMS:symptom would not extend to former.

Infectious Disease Epidemiology
One aspect in which IDO Core still has some limitations is in its representation of infectious disease epidemiology. While IDO Core contains qualities of disease affected populations, such as infectious disease incidence rate, infectious disease mortality rate, and infectious disease endemicity, previously it did not contain corresponding terms for sites at which these qualities are instantiated [8]. We are in the process of adding such terms to IDO Core, for example: infectious disease endemic site, infectious disease free site, and infectious disease non-endemic site. Adding these classes allows them to serve as parent classes for the brucellosis specific classes brucellosis endemic site, brucellosis free site, and brucellosis non-endemic site currently used in IDOBRU.
Similarly, IDOMAL contains the following classes none of which are present in IDO Core: holoendemicity, hypoendemicity, and mesoendemicity [26]. In IDOMAL these are subclasses of malaria endemicity. These terms are needed in IDOMAL to represent the varying degrees to which malaria is endemic within different populations. Holoendemicity is particularly relevant. In several regions of sub-Saharan Africa, holoendemicity is frequently seen in malaria, in particular the strain caused by Plasmodium falciparum (with one study finding that traces of the pathogen were present in 98.6% of the population within a 4-month period [46]).
IDO Core also currently lacks terms for infectious disease surveillance. Given the significant role surveillance plays within the study of infectious diseases generally, consideration should be given to adding relevant terms to IDO Core. Related terms already exist within the Vector Surveillance and Management Ontology (VSMO) [47]. These include the following classes, all of which are subclasses of OBI:planned process: • surveillance process =def A planned process with the objective to produce information about some evaluant with the purpose of, if justified by the information gathered, managing, directing, or protecting • pathogen surveillance =def A surveillance process aiming to produce information about one or several objects, in the form of microorganisms, which have the role of pathogen" • vector surveillance =def A surveillance process aiming to produce information about one or several objects, in the form of arthropods, which have the role of serving as biological pathogen vectors Surveillance process would not be appropriate to IDO (or indeed, really, to VSMO) since it has nothing specifically to do with infectious disease. It would be useful, for example, for an intelligence process ontology. Instead, we will add to OBI a more specific subclass of OBI:planned process covering any surveillance of entities related to biomedicine. This class, along with OBI:planned process, will be imported to IDO Core to serve as the parent of pathogen surveillance and vector surveillance, both of which are appropriate terms to add to IDO Core. Their definitions can then easily be revised by substituting the new biomedicine-related surveillance term for surveillance process in the definitions. This is particularly important for the facilitation of interoperability between the IDO suite of ontology modules, several of which already contain coverage of infectious disease surveillance: • IDOBRU contains the term infectious disease surveillance which serves as the parent for brucellosis surveillance.
• FLU is applied to influenza surveillance as part of the Centers for Excellence in Influenza Research and Surveillance program [22]. The ontology is meant to be applicable to any virus sequence and surveillance collection project, consolidating sequence and surveillance terms from a variety of online databases. • IDOMAL contains a variety of processes relevant to malaria surveillance, including surveillance in malaria eradication and process of malaria epidemiology, the latter of which has many children including collection of epidemiological data and investigation relating to the insect vector. • IDOSCHISTO contains a whole module devoted to the epidemiology of schistosomiasis. It contains classes such as prevention strategy and control strategy each of which has a variety of children classes pertaining to the prevention and control of schistosomiasis.
Infectious disease surveillance should also be added to IDO Core. It should be noted that presently within IDOBRU the term falls under a hierarchy of surveillance related terms (note: all of the terms below OBI:planned process are terms that were newly created for IDOBRU): OBI:planned process Health Surveillance biosurveillance disease surveillance animal disease surveillance human disease surveillance infectious disease surveillance brucellosis surveillance noninfectious disease surveillance All of the terms in the hierarchy from human disease surveillance up should probably be ported out to OBI. Infectious disease surveillance should be ported out to IDO Core and then imported to IDOBRU. There doesn't seem to be need in any ontology for the term noninfectious disease surveillance, so this term should be obsoleted.
Although IDO Core's coverage of epidemiology terms is in some ways limited, these issues can also be addressed through collaboration with the developers of epidemiology focused ontologies such as the Apollo Structured Vocabulary (Apollo-SV) [48] or Genomic Epidemiology Ontology (GenEpio) [49]. GenEpio reuses a number of important IDO Core terms and IDO extensions could incorporate the appropriate terms from Apollo-SV or GenEpio. As in all biomedical ontology, each of the mentioned ontologies evolves in response to term requests from users in order to accommodate the community's needs.
Apollo-SV provides a standardized vocabulary for terms and relations required for the interoperation between epidemic simulator models and public health application software that interface with these models. Apollo-SV provides a variety of pertinent terms including: • Terms pertaining to disease control, including: infectious disease control strategy (a child of Information Artifact Ontology 24 (IAO):plan specification), which has subclasses such as vector control strategy, place closure control strategy, travel-related infectious disease control strategy, and quarantine control strategy, the first three of which have more specific subclasses of their own; processes such as infectious disease control strategy execution  But we also need flexibility to infer hierarchies that are relevant to various applications. In the future we will define IDO extension ontologies for vector-borne diseases on the basis of an extension of IDO Core consisting of just those IDO Core terms relevant to vector-borne diseases together with all additional terms needed to deal with vector-borne diseases in a pathogen neutral fashion, IDO VectorBorne. Extension ontologies for specific vector-borne diseases would then be developed as extensions from IDO-VectorBorne [16].

The Coronavirus Infectious Disease Ontology (CIDO) and IDO Virus
The Coronavirus Infectious Disease Ontology (CIDO) deals with coronavirus infectious diseases in general. Though only recently created, the ontology has already been added to the OBO Foundry library. CIDO imports terms from a wide range of ontologies, including ChEBI, ICDO, NDF-RT, UBERON, GO, VO, and the NCBITaxon, among others. In addition, CIDO introduces 8 terms specific to the coronavirus domain. Alignment with IDO Core, however, will require much curation (see the Supplementary Materials for details).
One application of CIDO is to the analysis and integration of information on anticoronavirus drugs to facilitate drug repurposing against COVID-19. In a recent study [20], members of the CIDO team used text mining to identify chemical drugs and antibodies effective against at least one human coronavirus infection in vitro or in vivo and then mapped these drugs to ChEBI, NDF-RT and the Drug Ontology (DrugOn) [50], each of which provide logical axioms linking drugs to their roles and mechanisms of action. This information was then extracted for analysis. To fill key gaps in this information further relations will be built into CIDO linking drugs, coronaviruses, and the conditions under which the drugs work against the coronaviruses. CIDO is also being used in ongoing work to represent vaccines against coronavirus. In [21] reverse vaccinology and machine learning is used to predict potential vaccine targets for safe and effective COVID-19 vaccine development. The CIDO team will systematically annotate these vaccine candidates, along with their formulations and host responses, while working with the VO team to ontologically model and analyze these vaccines. To facilitate vaccine design, CIDO will be used in a further study to investigate host-pathogen interactions in order to better understand protective immune mechanisms.
IDO Virus is in the later stages of its development under John Beverley and collaborators, as a preparation for the rearchitecting of CIDO as the extension of IDO Virus for coronavirus diseases. IDO Virus will facilitate the construction of ontology modules focused exclusively on SARS-coronavirus, which caused the 2002-2003 SARS outbreak, and SARS-coronavirus2, which caused the ongoing coronavirus disease 2019 (COVID-19) pandemic. Our immediate priority is to fast track the creation of an ontology concerning COVID-19 -IDO-CovID-19 -as an extension of CIDO. In line with these efforts, the Protein Ontology consortium has recently created new terms dealing with SARS-CoV-2 proteins. 25 To illustrate how such extension relationships operate with respect to IDO Core, recall each IDO extension ontology should be developed in a modular fashion, providing terms specific to their own domains which build upon the disease and pathogen-neutral terms imported from IDO Core. 26 In the typical case, terms from extension ontologies are created via downward population of the upper level terms provided by IDO Core. For example: In some cases, a higher-level term that is needed within a particular extension ontology may not be present in IDO Core. If the relevant term is specific to the infectious disease domain and is truly pathogen neutral then it should be added to IDO Core. If the relevant term is included in an OBO Foundry ontology then it can be imported from there. Each of these considerations is being adhered to in the construction of IDO Virus.
To create IDO Virus, relevant IDO Core terms were introduced by adding the term 'virus' to generate child terms pertaining to viral disease. Thus, to give a few examples, IDO Virus extends IDO Core by introducing terms such as: • viral disorder =def An infectious disorder involving virus infectious agents • viral disease =def An infectious disease whose physical basis is a viral disorder • viral disease course =def An infectious disease course that realizes a viral disease Paralleling the IDO Core extension of OGMS introduced earlier. IDO Core:subclinical infection -an infection that is part of an asymptomatic host -provides resources needed to represent asymptomatic viral infections: • subclinical viral infection =def A subclinical infection involving virus infectious agents As viral infections need not result in the manifestation of symptoms, yet nevertheless satisfy all other criteria for inclusion as infections.
Leveraging IDO Virus, relevant terms will be added to CIDO and IDO-CovID-19. Figure 3 provides a summary of several important links:

Figure 3
Representing asymptomatic infection is of crucial importance, as the spread of SARS-CoV-2 makes clear. IDO Virus provides CIDO and IDO-CovID-19 needed resources in that respect: The introduction in IDO Virus of the many stages of viral reproduction will assist researchers in classifying specific mechanisms of efficacy for given antivirals.
To see how, consider IDO Core includes a class populated by antiviral instances, and this class is extended in IDO Virus: • viricidal =def An antiviral bearing the viricidal disposition • viricidal disposition =def A disposition to kill viruses • virostatic =def An antiviral bearing the virostatic disposition • virostatic disposition =def A disposition to inhibit the lifecycle of viruses Logical definitions can also be constructed for these terms, such as: • viricidal  antiviral AND has_disposition SOME viricidal disposition • viricidal disposition  antiviral disposition AND inheres_in SOME (antiviral) AND realized_in ONLY (process AND results_in SOME(virus death temporal boundary)) • virostatic  antiviral AND has_disposition SOME virostatic disposition • virostatic disposition  antiviral disposition AND inheres_in SOME (antiviral) AND realized_in ONLY (process AND negatively_regulates SOME viral reproduction)) By introducing terms such as: • virus death temporal boundary =def Organism death temporal boundary that marks the end of the life cycle of a virus • virus birth temporal boundary =def Organism birth temporal boundary that marks the beginning of the life of a virus • viral reproduction =def A reproduction process involving the production of a virus containing some portion of genetic material inherited from a parent virus Themselves relying on IDO Core terms such as IDO:organism death temporal boundary, IDO:organism birth temporal boundary, and IDO:reproduction, the latter defined as the production of individuals containing some portion of genetic material inherited from one or more parent organisms. These terms allow for characterization of negative regulation of viruses via drugs targeting specific parts of the virus reproduction process in these ontologies, for example in CIDO: • CIDO:negative regulation of coronavirus replication is a subclass of IDO Virus:negative regulation of viral replication • CIDO:negative regulation of coronavirus replication IDO:negatively regulates SOME CIDO:coronavirus replication process • CIDO:negative regulation of coronavirus synthesis IDO:negatively regulates SOME CIDO:coronavirus synthesis process And in the case of IDO-CoVID-19: • IDO-CovID-19:negative regulation of COVID-19 replication is a subclass of CIDO:negative regulation of coronavirus replication • IDO-CovID-19:negative regulation of COVID-19 synthesis is a subclass of CIDO:negative regulation of coronavirus synthesis • IDO-CovID-19:negative regulation of coronavirus synthesis IDO:negatively regulates SOME IDO-CovID-19: COVID-19 synthesis process Such classes provide resources needed to annotate and unify existing data concerning coronavirus antivirals in general, and COVID-19 antivirals in particular. Given the pressing need for progress in combatting the spread of these viruses in humans, consolidating and interpreting such data is of paramount importance. As is having terms needed for viral disease monitoring, such as the following terms in IDO Virus extending IDO:infectious disease epidemic and IDO:infectious disease pandemic, respectively: • viral disease epidemic =def An infectious disease epidemic largely involving viral infectious agents • viral disease pandemic =def An infectious disease pandemic consisting of viral epidemics Themselves easily extended to CIDO: • coronavirus epidemic =def An infectious disease epidemic largely involving viral infectious agents • coronavirus pandemic =def An infectious disease pandemic consisting of viral epidemics And from CIDO to IDO-CovID-19: • COVID-19 epidemic =def An infectious disease epidemic largely involving viral infectious agents • COVID-19 pandemic =def An infectious disease pandemic consisting of viral epidemics Refining and expanding of these terms in IDO Virus, CIDO, and IDO-CovID-19 is still in progress, but made much easier by importing IDO Core as a starting point.
A similar strategy to that illustrated above will be used to link IDO Virus to existing virus ontologies, for example:

• FLU:influenza A virus infection is a subclass of IDO Virus:virus infection • HIV:HIV virus infection is a subclass of IDO Virus:virus infection
Additionally, as should be clear, IDO Virus provides terms needed to distinguish bacterial, fungal, and viral infectious agents, infectious agent dispositions realized in relevant infectious processes. This makes possible deploying the preceding strategy to the construction of IDO-Bacteria, IDO-Fungus and IDO-Parasite, and the linking of these reference ontologies to IDO Core. We can then use these reference ontologies to distinguish between viral, bacterial, fungal and parasitic infections, between viral, bacterial, fungal and parasitic infectious diseases, and so on, linking existing ontologies covering more specific bacteria, fungi, and parasites to IDO Core.
Other ontology initiatives being developed to support curation of COVID-19 data outside the IDO framework include: • The WHO COVID-19 Rapid Version CRF, which provides a semantic data model for the RAPID version (23 March 2020) of the WHO's COVID-19 case record form. 27 • The COVID-19 Surveillance Ontology supports COVID-19 surveillance in primary care by facilitating the monitoring of COVID-19 cases and related respiratory conditions using data from multiple brands of computerized medical record systems. 28 • The Linked COVID-19 Data Ontology uses RDF to present COVID-19 datasets from the European Centre for Disease Prevention and Control, John Hopkins University and the Robert Koch-Institut (At present this is little more than a list of datasets rather than a bona fide ontology.) 29 These are all stand-alone initiatives, and are thus subject to the silo problems documented in the foregoing. The following is an incomplete list of OBO Foundry related databases to which IDO ontology annotations can be, and in some cases are, applied:

IDO Applications
• UniProt (https://www.uniprot.org/) representing millions of gene products annotated using terms from the Gene Ontology (GO) [ In addition to providing a basis for database interoperability, IDO annotations can also serve a variety of other purposes, including [6]: • Enhanced interpretation of data from genome-wide and high-throughput experiments.
• Use in software tools for the analysis and interpretation of microarray data, and in the development of new bioinformatic approaches to analysis of such data. • The integration of text-mining approaches with microarray data to facilitate disease gene identification.
Reflecting their use in supporting knowledge re-use and automated reasoning, ontologies have been implemented in a variety of applications for the enhancement of patient diagnosis, care management and clinical decision support [54][55][56][57][58]. A brief overview and further references are provided in [59]. In the fields of infectious disease Clinical Decision Support Systems (CDSSs) are commonly used in diagnostic assistance, guidance in the prescription of anti-infectives, biosurveillance, and vector control. The use of ontologies in CDSSs is increasingly on the rise: • Use of antibiotic decision support systems (ADSSs) has been shown to be effective in mitigating inappropriate antibiotic prescribing and lowering local antimicrobial resistance [60][61][62]. To facilitate interoperability and widespread circulation of future ADSSs, a Bacterial Clinical Infectious Disease Ontology (BCIDO) has been developed from IDO Core [63]. • In the case of methicillin-resistant Staphylococcus aureus MRSa, the Staphylococcus aureus Infectious Disease Ontology (IDOSA) has the potential to play a similar role by providing a classification of Sa that allows automated inference of resistance profiles [15,16].
• IDDAP, a recently developed ontology-driven clinical decision support system for infectious disease diagnosis and antibiotic prescription makes heavy use of IDO Core [64]. • The Dengue Decision Support System (DDSS) is an ontology driven computational application developed at Colorado State University [65,66] with the aim of guiding the implementation of locally appropriate Dengue and Dengue Vector control programs. The DDSS is used in conjunction with Chaak, a cell phone-based system for i) the field capture of data relating to Dengue vector surveillance; and ii) the rapid transfer of the data to the central DDSS database [67]. 30 The DDSS makes use of IDOMAL [25], as well as the related ontologies MIRO [52] and the Vector Surveillance and Management Ontology (VSMO) [47].
Recently the National Institute of Allergy and Infectious Diseases awarded a new 5-year contract, worth up to $7.2 million in 2019-2020, to support the integration of the Eukaryotic Pathogen Genomics Database and the Bioinformatics Resource for Invertebrate Vectors of Human Pathogens. 31 These resources will be joined into one bioinformatics resource, the Eukaryotic Pathogen, Host & Vector Genomics Resource (VEuPathDB). The project will involve the leveraging of ontologies to expand the harmonization of semantic terms across all sites. Understanding of what the data stored in VEuPathDB is about is encoded through the use of the VEuPathDB application ontology [68]. IDO Core will play a role in this project, as the VEuPathDB ontology imports the following from IDO Core terms such as: human pathogenicity disposition, infection, infection prevalence, pathogen role, and primary infection.

CONCLUDING REMARKS
Successful information-driven research on COVID-19 needs to be able to integrate the already massive and exponentially growing body of research and data concerning coronavirus diseases. IDO ontologies such as IDO Virus, CIDO and IDO-CovID-19 fill this need by providing a standardized, computer-interpretable representation of heterogenous coronavirus knowledge to facilitate machine learning [88]. Because these ontologies are built as part of an interoperable suite of IDO ontology modules, it becomes easier, for example, to compare CoVID-19 to other respiratory diseases such as SARS, MERS, and influenza -along multiple dimensions including underlying disorders, pathogen features (such as strain, virulence factors, and drug resistance), host-pathogen interactions, routes of transmission, anatomical sites of infection, genetic and environmental variables, symptoms, diagnostic criteria, disease courses, prevention measures and so on.
The utility of the IDO framework turns not least on the fact that we will continue to face the threat of novel viruses (as well as bacteria and parasites) in the future, and the IDO suite provides an easy to follow recipe for building new pathogen-specific ontologies in a way that allows easy comparison, along multiple dimensions, of novel pathogens and diseases with pathogens and diseases about which data have already been assembled.
The IDO strategy also brings about a situation in which there is a community trained in how to build, use, understand and correct both single ontologies and groups of ontologies that fit together. This promotes coordination of data curation as it relates to data crossing disciplinary boundaries, for example between vaccine research and pathogen genomics. Moreover, the IDO corpus provides a set of rigorously curated definitions of terms used in infectious disease research that can be employed to provide a useful vehicle for cross-disciplinary collaboration, allowing specialists in one sub-domain to rapidly gain an understanding of the meanings of the technical terms used in neighboring domains.
Finally, the IDO ontologies can make a contribution also to addressing a further urgent problem faced not merely by CoVID research but by contemporary biomedical research in general. This is the problem of reproducibility. This problem applies not only to scientific findings which are the results of experimental studies, but also to findings deriving from the application of different types of diagnostic tests. For an experiment, or a test, to be reproducible, it is crucial that we have a clear understanding of how the experimental or test results were obtained. For this to be possible, however, it is crucial that the constituent processes are described in a terminology that is widely used and whose terms are well defined. We believe that, when used in combination with the Ontology for Biomedical Investigations (OBI) [14], IDO offers a strategy to obtain comparable and integratable provenance metadata for the data generated in infectious disease research.

Planarian Phenotype Ontology (PLANP)
Vaccine Ontology (VO) [17] Vital Sign Ontology (VSO) [84] [25,26] Created by the Christos Louis team at the Institute of Molecular Biology and Biochemistry (IMBB) in Crete, Greece, IDOMAL covers both clinical and epidemiological aspects of malaria, disease and vector biology, as well as intervention attempts to control the disease. IDOMAL was developed in the context of VectorBase, wherein several terms are used to annotate its datasets. IDOMAL was recently inherited by Christian Stoeckert and his team at the University of Pennsylvania where efforts are underway to reengineer the ontology's content to bring about closer conformity to IDO Core. Dengue Ontology (IDODEN) [24] Also developed by the Louis research team at IMBB, IDODEN covers all aspects of dengue fever including disease biology, epidemiology, clinical features, and vector entomology. IDODEN was developed in the context of VectorBase and its structure was designed to mirror that of IDOMAL. Intended for use in Dengue decision support systems. Brucellosis Ontology (IDOBRU) [8,23] Developed by Yongqun He and his team at the University of Michigan, IDOBRU focuses on Brucellosis, a highly contagious zoonotic infectious disease caused by the intracellular, Gram-negative bacteria Brucella. IDOBRU encompasses the domains of clinical care, public health, and biomedical research, along seven major axes: host infection and zoonotic disease transmission, symptoms, virulence factors and pathogenesis, diagnosis, intentional release, vaccine prevention, and treatment. Influenza Ontology (FLU) [22] Developed by Richard Scheuermann, Lynn Schriml, Joanne Luciano, and Burke Squires, FLU covers the natural, experimental and clinical realms related to influenza virus life cycle, infection and disease. FLU utilizes OBI classes for components of materials, qualities and processes to map influenza virus sequence and surveillance terms to their corresponding materials and qualities. FLU is applied to data collected by the Centers for Excellence in Influenza Research and Surveillance (CEIRS) project to help researchers more easily elucidate influenza virulence and pathogenesis etiology. Staphylococcus aureus Infectious Disease Ontology (IDOSA) [7,15,16] Developed by Lindsay Cowell, Barry Smith and Albert Goldfain, in collaboration with Dr. Vance Fowler at Duke University Medical Center, IDOSA focuses on Staph aureus (Sa) infection diseases. The ontology is used to analyze networks of functionally related gene products to identify host genes conferring susceptibility to Staphylococcus aureus bacteremia.

HIV Ontology 32 (HIV)
Developed by Martin Schiller of UNLV, the HIV ontology is intended to cover all types of HIV data and information; intended for use in the HIVToolbox web application [85,86]. Schistosomiasis Ontology (IDOSCHISTO) [29,30] Developed by team of researchers based in Senegal led by Dr. Gaoussou Camara, IDOSCHISTO focuses on schistosomiasis, a waterborne infectious disease caused by Schistosoma helminth parasites. The ontology is organized into 3 main sub-modules; i) Biology, e.g. pathogen/host interactions, host physiological reactions to the disease, pathogen taxonomy and life-cycle; ii) Epidemiology, for example risk factors, spread of disease, means to prevent and control; and iii) Clinical, including symptoms that influence differential diagnoses and treatment decisions. Plant Disease Ontology (IDOPlant) [28] Under active development within the context of the Planteome Project, IDOPlant provides a comprehensive reference ontology for any infectious plant disease. The main aims of IDOPlant are to "provide plant scientists with the means identify genomic and genetic signatures host-pathogen interactions, resistance, or susceptibility, and to help agronomists and farmers by developing tools to identify disease phenotypes and gather epidemiological statistics" [28]. Meningitis Ontology (IDOMEN) [27] Also developed by Dr. Camara's team, IDOMEN focuses on meningitis, a disease caused by the gram-negative bacteria, neisseria meningitidis. IDOMEN covers the meningitis domain along three main axes: i) biological (immunity, virulence factors, pathogen and host biology); ii) clinical (clinical manifestations, laboratory tests and findings, diagnosis and treatment); epidemiological (surveillance, prevention, epidemic emergence factors such as risk behaviors, climate and environment). Coronavirus Infectious Disease Ontology (CIDO) [20,21] Initiated by Yongqun He and Hong Yu, CIDO was developed by a collaborative group of researchers in both the US and in China in response to the recent COVID-19 outbreaks. CIDO provides standardized human-and computer-interpretable annotation and representation of various coronavirus infectious diseases, including their etiology, transmission, epidemiology, pathogenesis, diagnosis, prevention, and treatment. In addition IDO extension ontologies for Tuberculosis (IDOTUB) and Infective endocarditis are planned. Preliminary work on the latter is available online. 33 Terms drawn from IDO Core are also used in a number of other important ontologies within the infectious disease domain, as illustrated in Table 3 [63] BCIDO provides a controlled terminology for clinical bacterial infectious diseases along with domain knowledge commonly used in the hospital inpatient setting. BCIDO was designed to augment the use of ADSSs, thus serving as a tool to guide differential bacterial diagnoses and to assist in the prescribing of appropriate antimicrobial treatments. As such, BCIDO "encompasses terms and knowledge about common clinical presentations of [bacterial] infections, patient specific factors that influence differential diagnoses and treatment options, the [bacteria] themselves, and the antimicrobial agents used to treat infections" [63]. Though not an IDO extension per se, BCIDO imports IDO Core in full. Vector Surveillance and Management Ontology 34 (VSMO) [47] Developed by a team of researchers at Colorado State University led by Drs. Lars Eisen and Saul Lozano, in collaboration with Dr. Cowell, VSMO covers the domain of surveillance and management of vectors and vector-borne pathogens, with special emphasis on content to support operational activities through inclusion in databases, data management systems and decision support systems [47]. The ontology includes terms for i) arthropod species capable of being biological vectors and for pathogen species transmitted by arthropod vectors; ii) chemical compounds relevant to insecticide resistance and chemical pesticide active substances (originating from MIRO); iii) terms for equipment used to collect or control vectors or vertebrae pathogen hosts, and tools used to kill vectors or prevent contact with humans. In consultation with the IDO Core developers, the relation has_vector was created for VSMO in order to link pathogens to their biological vectors. Genomic Epidemiology Ontology (GenEpio) [49] GenEpio is a controlled vocabulary for infectious disease surveillance and outbreak investigations implementing whole genome sequencing (WGS) of microbial pathogens. GenEpio aims to enable the integration and promotion of all contextual information required to interpret pathogen genomics data, including critical knowledge about sequencing pipelines and sequence quality; lab results describing antimicrobial resistance and virulence phenotypes; epidemiological data concerning potential sources of risk and exposure; as well as data about susceptible populations and geographical distributions of pathogen strain [49]. GenEpio is currently being integrated into IRIDA (Integrated Rapid Infectious Disease Analysis), a user-friendly, decentralized, open-source bioinformatics and analytical web platform to support real-time infectious disease outbreak investigations using WGS data [87].