Skip to content

BioIngest Data Explorer

Browse 28 biomedical data sources, 66 queryable tables, and 3 cross-reference mapping tables.

Click to preview real columns and rows before downloading.


Browse S3 Bucket Browse GitHub Open Athena Request a Source


Browse Data

Proteins & Targets
Dataset Description GitHub S3
uniprot/swissprot_tsv 570K reviewed proteins (15 cols)
uniprot/idmapping_human UniProt cross-references (5 cols)
string/protein_links Protein-protein interactions (3 cols)
interpro/protein2ipr Protein domain families (6 cols)
chembl/uniprot_mapping ChEMBL target IDs (4 cols)
complex_portal/homo_sapiens Protein complexes (19 cols)
markerdb/proteins 4K biomarker proteins (12 cols)
Disease & Pathway Associations
Dataset Description GitHub S3
diseases/knowledge Disease-gene associations (7 cols)
diseases/experiments Experiment-based associations
reactome/UniProt2Reactome 2.5M pathway memberships (6 cols)
ttd/targets Therapeutic targets & drugs
ttd/drugs Drug-target-indication triples
clinvar/variant_summary Variant-disease significance
gtex/median_tissue_tpm Tissue expression matrix
Ontologies
Dataset Description GitHub S3
mondo/mondo 47K disease terms + hierarchy
disease_ontology/doid 12K Disease Ontology terms
efo/efo 50K experimental factors
gene_ontology/go_basic 45K Gene Ontology terms
mesh/descriptors MeSH medical vocabulary
icd/icd10cm ICD-10-CM diagnosis codes
Competitors & Assay Platforms
Dataset Description GitHub S3
competitors_msd/msd_assays MSD assay specs (LOD, range)
competitors_quanterix/assays Quanterix Simoa specs (8 cols)
competitors_alamar/all_targets Alamar NULISAseq targets (26 cols)
competitors_nomic/omni_1000 Nomic Omni 1000 targets
somalogic/somascan_11k SomaScan 11K menu
Olink & HPA Proteomics
Dataset Description GitHub S3
hpa_olink/proteinatlas HPA full proteomics
Genomics & Variants
Dataset Description GitHub S3
gnomad/constraint Gene constraint (pLI, LOEUF)
ensembl/gene_info Gene annotations (GRCh38)
dbsnp/variant_freq Population variant frequencies
ukb_disease_assoc/pan_ukb UK Biobank GWAS

Mapping Tables

Cross-reference tables linking all sources through shared identifiers:

Table Rows Cols What it maps S3
protein_map 26,499 9 UniProt ↔ Ensembl ↔ STRING ↔ ChEMBL ↔ HGNC
disease_map 31,884 10 MONDO ↔ EFO ↔ DOID ↔ MeSH ↔ ICD-10
drug_map 42,939 5 TTD drugs ↔ targets ↔ indications

Don't see what you need?

Request a New Data Source

We add new sources regularly. Tell us what database you'd like and we'll build the connector.