Ontology Sources¶
MONDO¶
The Monarch Disease Ontology — a unified disease hierarchy merging terms from OMIM, Orphanet, EFO, DOID, MeSH, NCIt, and ICD. With ~47K terms and extensive cross-references, MONDO is the canonical backbone of disease_map. Every disease in bioingest resolves to a MONDO ID.
| Field | Example | Description |
|---|---|---|
id |
MONDO:0005015 |
MONDO term ID |
name |
diabetes mellitus |
Human-readable label |
namespace |
mondo |
Always "mondo" |
definition |
A metabolic disease... |
Textual definition |
is_a |
MONDO:0005066 |
Parent term (hierarchy) |
xref |
DOID:9351, EFO:0000400, MESH:D003920, ICD10CM:E11 |
Cross-references |
Cross-refs provided: EFO, DOID, MeSH, ICD-10-CM, MedGen, OMIM, Orphanet, MedDRA Role in disease_map: Primary source — all rows derived from MONDO xrefs
uv run bioingest download mondo
uv run bioingest ontology build --uri bolt://localhost:7687 --user neo4j --password secret
Disease Ontology¶
Community-driven ontology of human diseases with ~12K terms. Provides hierarchical classification and cross-references to MeSH, ICD, OMIM, and NCI Thesaurus. Complements MONDO with additional synonym coverage.
| Field | Example | Description |
|---|---|---|
id |
DOID:9351 |
Disease Ontology ID |
name |
diabetes mellitus |
Label |
is_a |
DOID:4 |
Parent term |
xref |
MESH:D003920, ICD10CM:E11 |
Cross-references |
Term count: ~12,000
Cross-refs: MeSH, ICD-10-CM, OMIM, NCI, UMLS
Role in disease_map: Supplementary — disease_map.doid links here
EFO (Experimental Factor Ontology)¶
EMBL-EBI ontology used by Open Targets, GWAS Catalog, and Expression Atlas to annotate experimental variables. Covers diseases, phenotypes, assays, and anatomical terms. ~53K classes.
| Field | Example | Description |
|---|---|---|
id |
EFO:0000311 |
EFO term ID |
name |
cancer |
Label |
is_a |
EFO:0000408 |
Parent |
xref |
MONDO:0004992, DOID:162, MESH:D009369 |
Cross-references |
Term count: ~53,000
Cross-refs: MONDO, DOID, MeSH, NCIt, OMIM
Role in disease_map: disease_map.efo_id — required for joining Open Targets associations
Gene Ontology¶
Three-branch ontology (Biological Process, Molecular Function, Cellular Component) with ~45K terms. Used for functional annotation of proteins. Integrated via interpro2go mappings and QuickGO annotations.
| Field | Example | Description |
|---|---|---|
id |
GO:0006915 |
GO term ID |
name |
apoptotic process |
Label |
namespace |
biological_process |
Branch (BP/MF/CC) |
is_a |
GO:0012501 |
Parent term |
Term count: ~45,000 Cross-refs: MetaCyc, Reactome, KEGG (via relationships) Role in disease_map: Not directly in disease_map — used for protein functional annotation
uv run bioingest download gene_ontology
# Athena — after publish
SELECT id, name, namespace FROM bioingest.gene_ontology__go_basic WHERE namespace = 'biological_process' LIMIT 100;
MeSH (Medical Subject Headings)¶
NLM's controlled vocabulary for indexing PubMed articles and ClinicalTrials.gov. Hierarchical descriptors cover diseases, anatomy, chemicals, and procedures. ~30K descriptors with tree numbers enabling hierarchical queries.
| Field | Example | Description |
|---|---|---|
DescriptorUI |
D003920 |
MeSH unique ID |
DescriptorName |
Diabetes Mellitus |
Preferred term |
TreeNumber |
C18.452.394.750 |
Hierarchical classification |
Term count: ~30,000 descriptors
Cross-refs: Mapped to MONDO/DOID via cross-reference tables
Role in disease_map: disease_map.mesh_id — bridges PubMed/ClinicalTrials to MONDO
ICD-10-CM¶
WHO International Classification of Diseases, 10th revision, Clinical Modification. Used for clinical billing and epidemiology. Provides the code system referenced in EHR data and clinical trials.
| Field | Example | Description |
|---|---|---|
code |
E11.9 |
ICD-10-CM code |
description |
Type 2 diabetes mellitus without complications |
Code description |
category |
E11 |
Parent category |
Term count: ~72,000 codes
Cross-refs: Mapped to MONDO via disease_map.icd10_code
Role in disease_map: disease_map.icd10_code — bridges clinical codes to MONDO