Skip to content

Data Sources & Download

Auto-Downloadable Sources (18)

bioingest download-all --all --max-size 2gb
Source ID Format
UniProt uniprot TSV, FASTA, XML
Open Targets opentargets Parquet
Reactome reactome TSV
ChEMBL chembl SQLite, TSV
DISEASES 2.0 diseases TSV
MarkerDB markerdb TSV
TTD ttd TSV, XLS
Complex Portal complex_portal TSV
HPA / Olink hpa_olink TSV
MeSH mesh XML
ICD-10-CM icd ZIP
PDB-KB / SIFTS pdb_complexes IDX, TSV.GZ
UK Biobank (Pan-UKB) ukb_disease_assoc TSV.BGZ
SomaLogic SomaScan somalogic TSV, SQLite
Gene Ontology gene_ontology OBO
Disease Ontology disease_ontology OBO
MONDO mondo OBO
EFO efo OBO

Commands

bioingest download                    # list all sources
bioingest download uniprot            # download default datasets
bioingest download uniprot --list     # show available datasets
bioingest download uniprot --all      # download everything
bioingest download-all --max-size 500mb  # all sources, skip large files

Features

  • Resumable — interrupted downloads resume from where they left off
  • Checksums — SHA-256 recorded in manifest.json per source
  • Selective — download specific datasets with --datasets X Y
  • Size limits — skip large files with --max-size