Skip to content

Ontology Sources

MONDO

The Monarch Disease Ontology — a unified disease hierarchy merging terms from OMIM, Orphanet, EFO, DOID, MeSH, NCIt, and ICD. With ~47K terms and extensive cross-references, MONDO is the canonical backbone of disease_map. Every disease in bioingest resolves to a MONDO ID.

Field Example Description
id MONDO:0005015 MONDO term ID
name diabetes mellitus Human-readable label
namespace mondo Always "mondo"
definition A metabolic disease... Textual definition
is_a MONDO:0005066 Parent term (hierarchy)
xref DOID:9351, EFO:0000400, MESH:D003920, ICD10CM:E11 Cross-references

Cross-refs provided: EFO, DOID, MeSH, ICD-10-CM, MedGen, OMIM, Orphanet, MedDRA Role in disease_map: Primary source — all rows derived from MONDO xrefs

uv run bioingest download mondo
uv run bioingest ontology build --uri bolt://localhost:7687 --user neo4j --password secret

Disease Ontology

Community-driven ontology of human diseases with ~12K terms. Provides hierarchical classification and cross-references to MeSH, ICD, OMIM, and NCI Thesaurus. Complements MONDO with additional synonym coverage.

Field Example Description
id DOID:9351 Disease Ontology ID
name diabetes mellitus Label
is_a DOID:4 Parent term
xref MESH:D003920, ICD10CM:E11 Cross-references

Term count: ~12,000 Cross-refs: MeSH, ICD-10-CM, OMIM, NCI, UMLS Role in disease_map: Supplementary — disease_map.doid links here

uv run bioingest download disease_ontology

EFO (Experimental Factor Ontology)

EMBL-EBI ontology used by Open Targets, GWAS Catalog, and Expression Atlas to annotate experimental variables. Covers diseases, phenotypes, assays, and anatomical terms. ~53K classes.

Field Example Description
id EFO:0000311 EFO term ID
name cancer Label
is_a EFO:0000408 Parent
xref MONDO:0004992, DOID:162, MESH:D009369 Cross-references

Term count: ~53,000 Cross-refs: MONDO, DOID, MeSH, NCIt, OMIM Role in disease_map: disease_map.efo_id — required for joining Open Targets associations

uv run bioingest download efo

Gene Ontology

Three-branch ontology (Biological Process, Molecular Function, Cellular Component) with ~45K terms. Used for functional annotation of proteins. Integrated via interpro2go mappings and QuickGO annotations.

Field Example Description
id GO:0006915 GO term ID
name apoptotic process Label
namespace biological_process Branch (BP/MF/CC)
is_a GO:0012501 Parent term

Term count: ~45,000 Cross-refs: MetaCyc, Reactome, KEGG (via relationships) Role in disease_map: Not directly in disease_map — used for protein functional annotation

uv run bioingest download gene_ontology
# Athena — after publish
SELECT id, name, namespace FROM bioingest.gene_ontology__go_basic WHERE namespace = 'biological_process' LIMIT 100;

MeSH (Medical Subject Headings)

NLM's controlled vocabulary for indexing PubMed articles and ClinicalTrials.gov. Hierarchical descriptors cover diseases, anatomy, chemicals, and procedures. ~30K descriptors with tree numbers enabling hierarchical queries.

Field Example Description
DescriptorUI D003920 MeSH unique ID
DescriptorName Diabetes Mellitus Preferred term
TreeNumber C18.452.394.750 Hierarchical classification

Term count: ~30,000 descriptors Cross-refs: Mapped to MONDO/DOID via cross-reference tables Role in disease_map: disease_map.mesh_id — bridges PubMed/ClinicalTrials to MONDO

uv run bioingest download mesh

ICD-10-CM

WHO International Classification of Diseases, 10th revision, Clinical Modification. Used for clinical billing and epidemiology. Provides the code system referenced in EHR data and clinical trials.

Field Example Description
code E11.9 ICD-10-CM code
description Type 2 diabetes mellitus without complications Code description
category E11 Parent category

Term count: ~72,000 codes Cross-refs: Mapped to MONDO via disease_map.icd10_code Role in disease_map: disease_map.icd10_code — bridges clinical codes to MONDO

uv run bioingest download icd