Skip to content

Knowledge Graph Example

A concrete example showing how bioingest open-source data connects in the graph for a single protein: EGFR.

Protein Disease Drug Pathway Tissue

Interactive: drag to pan, scroll to zoom, hover for labels. Data from UniProt, STRING, Open Targets, ChEMBL, Reactome, GTEx.


What the graph shows

All data comes from open-source databases that bioingest downloads and maps together:

Node type Source Example
Protein (blue) UniProt via protein_map EGFR (P00533), ERBB2 (P04626)
Disease (red) MONDO via disease_map Non-small cell lung carcinoma, Glioblastoma
Drug (green) ChEMBL / TTD via drug_map Erlotinib, Gefitinib, Cetuximab
Pathway (purple) Reactome Signaling by EGFR, RAS signaling
Tissue (orange) GTEx Lung (12.4 TPM), Skin (8.7 TPM)

Edge types and their sources

Relationship Source database Evidence
ASSOCIATED_WITH Open Targets, DISEASES 2.0 Association score (0-1)
INTERACTS_WITH STRING Combined score (0-1000)
INHIBITS / TREATS ChEMBL, TTD Clinical phase (1-4)
PARTICIPATES_IN Reactome Curated (TAS, IEA)
EXPRESSED_IN GTEx Median TPM
SAME_AS Mapping graph Cross-reference
BROADER_THAN MONDO hierarchy Ontology is_a

Full graph scale

Metric Count
Protein nodes 26,499
Disease nodes 31,884
Drug nodes 42,939
Mapping edges 201,961
STRING PPI edges (score >= 700) ~1.5M
Reactome pathway edges ~2.5M
Open Targets associations ~7M
LLM-extracted edges growing daily