Metadata-Version: 2.4
Name: trazaeo
Version: 0.5.5
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Programming Language :: Rust
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Security :: Cryptography
Requires-Dist: jupyterlab>=4.0 ; extra == 'notebooks'
Requires-Dist: ipykernel>=6.29.0 ; extra == 'notebooks'
Requires-Dist: matplotlib>=3.8 ; extra == 'notebooks'
Requires-Dist: dask[array]>=2024.8.0 ; extra == 'notebooks'
Requires-Dist: netcdf4>=1.6.5 ; extra == 'notebooks'
Requires-Dist: xarray>=2025.1.1 ; extra == 'notebooks'
Requires-Dist: zarr>=2.18.0 ; extra == 'notebooks'
Requires-Dist: netcdf4>=1.6.5 ; extra == 'python-examples'
Requires-Dist: xarray>=2025.1.1 ; extra == 'python-examples'
Requires-Dist: zarr>=2.18.0 ; extra == 'python-examples'
Requires-Dist: netcdf4>=1.6.5 ; extra == 'qa'
Requires-Dist: pytest>=8.0 ; extra == 'qa'
Requires-Dist: mypy>=1.8 ; extra == 'qa'
Requires-Dist: ruff>=0.2 ; extra == 'qa'
Requires-Dist: pytest>=8.0 ; extra == 'test'
Requires-Dist: mypy>=1.8 ; extra == 'test'
Requires-Dist: ruff>=0.2 ; extra == 'test'
Provides-Extra: notebooks
Provides-Extra: python-examples
Provides-Extra: qa
Provides-Extra: test
License-File: LICENSE
Summary: Open-source provenance SDK and specification for verifiable EO and climate data workflows
Keywords: provenance,earth-observation,climate,verifiable-data,solana
Home-Page: https://endcorp-hq.github.io/provenance
License-Expression: MIT OR Apache-2.0
Requires-Python: >=3.12
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM

# trazaeo

[![PyPI](https://img.shields.io/pypi/v/trazaeo?logo=pypi&logoColor=white)](https://pypi.org/project/trazaeo/)
[![crates.io](https://img.shields.io/crates/v/trazaeo?logo=rust&logoColor=white)](https://crates.io/crates/trazaeo)

`trazaeo` is a Python-first package for adding verifiable provenance to Earth
observation and climate data workflows. It gives you fast content hashing,
signed provenance envelopes, and workflow helpers you can drop into an existing
pipeline without replacing your current scheduler, storage layer, or transform
code.

Use it when you want to:

- hash outputs from an existing batch or streaming job
- attach provenance to a dataset publish step
- verify that a delivered artifact still matches the published record
- add provenance checks around netCDF, Zarr, or Icechunk workflows

## Install

For most users:

```bash
pip install trazaeo
```

If you also want the optional netCDF, xarray, and Zarr helpers used by the
example workflows:

```bash
pip install 'trazaeo[python-examples]'
```

Published wheels are built as CPython `abi3` artifacts from Python 3.12, so a
single wheel works across Python 3.12+ on the supported platforms below. If a
prebuilt wheel is not available for your platform, `pip` will fall back to
building from source.

### Published wheel contract

The package metadata supports Python `3.12+`.

The verified published wheel matrix is:

- CPython `abi3` built from Python 3.12
- Linux `manylinux_2_28_x86_64`
- macOS `x86_64`
- macOS `arm64`
- Windows `x86_64`
- `import trazaeo`
- `import trazaeo_workflows.dataset_provenance`
- `from trazaeo import PublicRpcSolanaProofLogAdaptor`
- `trazaeo-icechunk --help`

### Source-build fallback

If you install on a platform outside that wheel matrix, `pip` will build from
source. In Debian/Ubuntu-style environments, install a C/Rust build toolchain
and Python development headers for the interpreter you are using:

```bash
apt-get update
apt-get install -y build-essential curl pkg-config python3-dev
curl https://sh.rustup.rs -sSf | sh -s -- -y
```

Then restart the shell and rerun:

```bash
pip install trazaeo
```

## Use It Inside Your Existing Pipeline

`trazaeo` is designed to sit at the boundaries of work you already do.

Typical places to add it:

- after a transform job writes a file, hash the artifact and store the content
  root with your job metadata
- before publishing a dataset, build and sign provenance for the output and its
  source inputs
- during delivery or audit, verify that the local artifact still matches the
  published checkpoint

You do not need to adopt a new pipeline framework. The package works well as:

- a Python helper inside an Airflow, Prefect, Dagster, or Argo task
- a provenance step called from an existing batch job or notebook
- a verification step in a release or data publication workflow

## Quick Start

The normal integration point is the Python API. A common first step is to hash
an artifact right after your pipeline writes it:

```python
from trazaeo import blake3_content_root


def register_pipeline_output(path: str) -> dict[str, str]:
    content_root = blake3_content_root(path, 4096, 4).hex()
    return {
        "artifact_path": path,
        "content_root_hash": content_root,
    }
```

That works well in an Airflow task, a Prefect flow, a Dagster asset, or a plain
Python batch job. You keep your existing transform code and add one provenance
step after the file is produced.

For in-memory content:

```python
from trazaeo import blake3_hash, blake3_hash_mt

single = blake3_hash(b"hello world").hex()
parallel = blake3_hash_mt(b"hello world", 4).hex()
```

## Artifact Verification In Process

If your pipeline publishes an artifact and later needs to verify what was
delivered, you can build a proof package for the local file:

```python
from trazaeo_workflows import build_local_artifact_full_root_proof_package


def build_local_artifact_proof(path: str) -> dict:
    return build_local_artifact_full_root_proof_package(
        path,
        chunk_size=1 << 20,
        threads=4,
    )
```

And when you already have a delivery proof package from an upstream publish
step, verify it against the artifact path:

```python
from trazaeo_workflows import verify_dataset_delivery_proof_report


def verify_delivery(path: str, delivery_proof_package: dict) -> dict:
    return verify_dataset_delivery_proof_report(
        delivery_proof_package,
        artifact_path=path,
    )
```

This fits naturally in a downstream validation, QA, or publication check step.

## Dataset Publish Workflows

The higher-level `trazaeo_workflows` helpers are for pipelines that already
track their source files, transform job ids, output artifact refs, signer ids,
and verification policy. In that case, you pass your existing metadata into
`trazaeo` and let it build the provenance bundle around work your pipeline
already performed.

The main Python workflow entrypoints are:

- `trazaeo_workflows.build_dataset_bootstrap_bundle`
- `trazaeo_workflows.build_dataset_incremental_bundle`
- `trazaeo_workflows.build_dataset_delivery_proof_package`
- `trazaeo_workflows.verify_dataset_delivery_proof_report`

Those helpers are used by the example netCDF and Icechunk flows in
`examples/python_netcdf/`.

A typical pattern is:

1. Your pipeline reads or transforms source files.
2. Your pipeline writes the dataset artifact.
3. You hash the artifact with `trazaeo`.
4. You pass the source metadata, output metadata, signer, and trust policy into
   a dataset workflow helper.
5. You store or publish the returned provenance bundle beside the dataset.

## Documentation

- Project docs: <https://endcorp-hq.github.io/provenance>
- Python workflow examples: `examples/python_netcdf/README.md`
- Protocol spec: `TRAZAEO_V1_SPEC.md`
- Architecture boundary: `docs/contracts/architecture.md`
- Quality gates: `docs/contracts/quality-gates.md`
- Rust crate overview: `trazaeo/README.md`

## Development

Most users only need `pip install trazaeo`. If you are contributing to this
repository, see `CONTRIBUTING.md` for local build, test, extension, and docs
workflows.

