Local File Ingestion¶
Ingest any local file or directory into the knowledge graph.
# Single file
bioingest ingest paper.pdf
bioingest ingest data/bulk/markerdb/proteins.tsv
# Directory (recursive)
bioingest ingest data/bulk/ttd/
bioingest ingest ~/papers/ --extensions .pdf
Supported Formats¶
| Format | Extension | Extraction Strategy |
|---|---|---|
.pdf |
pymupdf page-by-page text extraction | |
| TSV | .tsv |
Convert to markdown table |
| CSV | .csv |
Convert to markdown table |
| Text | .txt |
Direct read |
| Markdown | .md |
Direct read |
| HTML | .html, .htm |
Strip tags, remove script/style |
Directory Processing¶
When given a directory, bioingest recursively finds all supported files:
# Default extensions: .tsv, .csv, .txt, .md, .pdf
bioingest ingest data/bulk/
# Only specific types
bioingest ingest data/bulk/ --extensions .tsv,.csv
Table Handling¶
TSV/CSV files are converted to markdown tables before LLM extraction. This gives the LLM better context about column relationships:
Input TSV:
Gene Disease Score
BRCA1 Breast Cancer 0.95
TP53 Lung Cancer 0.88
Becomes:
| Gene | Disease | Score |
| --- | --- | --- |
| BRCA1 | Breast Cancer | 0.95 |
| TP53 | Lung Cancer | 0.88 |
The LLM then extracts entities from cells and relationships from row associations.