Knowledge Graph Example¶
A concrete example showing how bioingest open-source data connects in the graph for a single protein: EGFR.
Protein
Disease
Drug
Pathway
Tissue
Interactive: drag to pan, scroll to zoom, hover for labels. Data from UniProt, STRING, Open Targets, ChEMBL, Reactome, GTEx.
What the graph shows¶
All data comes from open-source databases that bioingest downloads and maps together:
| Node type | Source | Example |
|---|---|---|
| Protein (blue) | UniProt via protein_map | EGFR (P00533), ERBB2 (P04626) |
| Disease (red) | MONDO via disease_map | Non-small cell lung carcinoma, Glioblastoma |
| Drug (green) | ChEMBL / TTD via drug_map | Erlotinib, Gefitinib, Cetuximab |
| Pathway (purple) | Reactome | Signaling by EGFR, RAS signaling |
| Tissue (orange) | GTEx | Lung (12.4 TPM), Skin (8.7 TPM) |
Edge types and their sources¶
| Relationship | Source database | Evidence |
|---|---|---|
| ASSOCIATED_WITH | Open Targets, DISEASES 2.0 | Association score (0-1) |
| INTERACTS_WITH | STRING | Combined score (0-1000) |
| INHIBITS / TREATS | ChEMBL, TTD | Clinical phase (1-4) |
| PARTICIPATES_IN | Reactome | Curated (TAS, IEA) |
| EXPRESSED_IN | GTEx | Median TPM |
| SAME_AS | Mapping graph | Cross-reference |
| BROADER_THAN | MONDO hierarchy | Ontology is_a |
Full graph scale¶
| Metric | Count |
|---|---|
| Protein nodes | 26,499 |
| Disease nodes | 31,884 |
| Drug nodes | 42,939 |
| Mapping edges | 201,961 |
| STRING PPI edges (score >= 700) | ~1.5M |
| Reactome pathway edges | ~2.5M |
| Open Targets associations | ~7M |
| LLM-extracted edges | growing daily |