OBO Parsing (Structured Mode)¶
Deterministic parsing of OBO ontology files. No LLM needed — fast and reproducible.
Usage¶
# Export as JSONL (for graphrag_api bulk loader)
bioingest ontology export
# Write directly to Neo4j
bioingest ontology build --uri bolt://localhost:7687 --user neo4j --password secret --database olink3
What Gets Parsed¶
From each OBO [Term] block:
| Field | Example | Stored as |
|---|---|---|
id |
MONDO:0000002 |
Node ID |
name |
cardiovascular disease |
Node name |
def |
"A disease of..." |
Node definition |
synonym (EXACT) |
"CVD" |
Node synonyms list |
xref |
DOID:1287 |
Node xrefs + XREF edge |
is_a |
MONDO:0000001 |
IS_A edge |
relationship |
part_of MONDO:0000001 |
Named edge |
Obsolete terms (is_obsolete: true) are excluded.
JSONL Export Format¶
Compatible with graphrag_api's consolidated_state/ format:
ontology_nodes.jsonl:
{
"id": "MONDO:0000002",
"name": "cardiovascular disease",
"type": "Disease",
"definition": "A disease of the cardiovascular system.",
"synonyms": ["CVD"],
"xrefs": ["DOID:1287", "EFO:0000319"],
"source": "mondo",
"_doc_id": "ontology_mondo",
"_chunk_id": "obo_MONDO:0000002",
"publication_count": 0
}
ontology_relationships.jsonl:
{
"source_id": "MONDO:0000002",
"target_id": "MONDO:0000001",
"type": "IS_A",
"occurrence_count": 1,
"confidence": 1.0,
"consolidated": true,
"evidence_doc_ids": ["ontology_mondo"]
}
Version Tracking¶
Each build creates an OntologyVersion node:
(:OntologyVersion {
version_id: "a3f2b1c9...", -- SHA-256 of all term IDs
timestamp: "2026-05-15T...",
node_count: 154000,
sources: ["mondo", "disease_ontology", "efo", "gene_ontology"]
})
Re-running with the same OBO files is idempotent. Updated files produce a new version_id.