Metadata-Version: 2.4
Name: genomebundle
Version: 0.1.0
Summary: Reproducible bundles for EBP genome assemblies
Project-URL: Repository, https://github.com/gbell27/genomebundle
Author-email: Gabriele Bellavia <gabriele.bellavia.m@gmail.com>
License: MIT
License-File: LICENSE
Keywords: EBP,GoaT,assembly,bioinformatics,genomics
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.11
Requires-Dist: click>=8.1
Requires-Dist: httpx>=0.27
Description-Content-Type: text/markdown

# genomebundle

`genomebundle` bundles metadata and files for Earth BioGenome Project (EBP) genome assemblies into a single, reproducible package.

The core idea is simple: when you use a genome assembly in your research, you should be able to document exactly what you downloaded, when, and from where — with checksums. `genomebundle` does this by aggregating data from three sources into a machine-readable `manifest.json`:

- **GoaT** (Genomes on a Tree) — taxonomy and cross-references
- **NCBI Datasets** — assembly statistics and FTP file URLs
- **BlobToolKit** — BUSCO completeness results (assembly quality metrics)

This makes it easier to cite the data precisely and to keep pipelines reproducible across time.

## Installation
```bash
pip install genomebundle
```

## Basic CLI usage
```bash
# Download FASTA and GFF
genomebundle fetch GCF_040938575.1 --files fasta,gff

# Download all associated files
genomebundle fetch GCF_040938575.1 --files all

# Build manifest only (no download)
genomebundle fetch GCF_040938575.1 --no-download

# Verify checksums of a downloaded bundle
genomebundle verify ./GCF_040938575.1/

# Print manifest of an existing bundle
genomebundle show ./GCF_040938575.1/
```

## Python API
```python
from genomebundle import fetch_assembly, fetch_assembly_report, fetch_busco

goat = fetch_assembly("GCF_040938575.1")
ncbi = fetch_assembly_report("GCF_040938575.1")
btk  = fetch_busco("GCF_040938575.1")
```

## Output

Each bundle contains:
- `manifest.json` — machine-readable, includes SHA256 checksums and source URLs
- `README.txt` — human-readable summary
- downloaded files (optional)

## References

- Challis et al. (2023). GoaT: Genomes on a Tree. *Wellcome Open Research*. https://doi.org/10.12688/wellcomeopenres.18658.1
- Byrd et al. (2024). Best practices for genetic and genomic data archiving. *Nature Ecology & Evolution*. https://doi.org/10.1038/s41559-024-02423-7
- Dainat et al. (2025). Guidelines for gene and genome assembly nomenclature. *Genetics*. https://doi.org/10.1093/genetics/iyaf006

## License

MIT
