Metadata-Version: 2.4
Name: trazaeo
Version: 0.5.3
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Rust
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Security :: Cryptography
Requires-Dist: jupyterlab>=4.0 ; extra == 'notebooks'
Requires-Dist: ipykernel>=6.29.0 ; extra == 'notebooks'
Requires-Dist: matplotlib>=3.8 ; extra == 'notebooks'
Requires-Dist: dask[array]>=2024.8.0 ; extra == 'notebooks'
Requires-Dist: netcdf4>=1.6.5 ; extra == 'notebooks'
Requires-Dist: xarray>=2025.1.1 ; extra == 'notebooks'
Requires-Dist: zarr>=2.18.0 ; extra == 'notebooks'
Requires-Dist: netcdf4>=1.6.5 ; extra == 'python-examples'
Requires-Dist: xarray>=2025.1.1 ; extra == 'python-examples'
Requires-Dist: zarr>=2.18.0 ; extra == 'python-examples'
Requires-Dist: netcdf4>=1.6.5 ; extra == 'qa'
Requires-Dist: pytest>=8.0 ; extra == 'qa'
Requires-Dist: mypy>=1.8 ; extra == 'qa'
Requires-Dist: ruff>=0.2 ; extra == 'qa'
Requires-Dist: pytest>=8.0 ; extra == 'test'
Requires-Dist: mypy>=1.8 ; extra == 'test'
Requires-Dist: ruff>=0.2 ; extra == 'test'
Provides-Extra: notebooks
Provides-Extra: python-examples
Provides-Extra: qa
Provides-Extra: test
License-File: LICENSE
Summary: Open-source provenance SDK and specification for verifiable EO and climate data workflows
Keywords: provenance,earth-observation,climate,verifiable-data,solana
Home-Page: https://endcorp-hq.github.io/provenance
License-Expression: MIT OR Apache-2.0
Requires-Python: >=3.12
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM

# trazaeo V1

[![PyPI](https://img.shields.io/pypi/v/trazaeo?logo=pypi&logoColor=white)](https://pypi.org/project/trazaeo/)
[![crates.io](https://img.shields.io/crates/v/trazaeo?logo=rust&logoColor=white)](https://crates.io/crates/trazaeo)

This repository contains the `trazaeo` Rust crate and Python bindings for
verifiable provenance in Earth observation and climate data workflows. The
project includes hashing, provenance records, proof logging adaptors, and
examples for NC to Zarr or Icechunk verification flows.

The V1 protocol covers three primary use cases:

- source-device capture, where a sensor or edge device signs captured bytes
- transport receipt, where a ground station or relay attests to received bytes or helper processing
- dataset transforms and publication, where one or more inputs are turned into derived artifacts and checkpointed for audit

V1 envelope schemas live in `trazaeo/schemas/`.

The current release is V1: a stable core verification model with optional
adaptor-backed assurance for storage binding and proof logging.

Repository contracts:

- Compatibility matrix: `docs/contracts/compatibility.md`
- Quality gates: `docs/contracts/quality-gates.md`
- Python example boundaries: `docs/contracts/architecture.md`
- Merkle/Bao replacement proposal: `docs/proposals/merkle-bao-replacement.md`
- Roadmap: `ROADMAP.md`
- Documentation site source: `website/` (Vocs)

## Building

You can build the crate with the standard Rust toolchain. From the repository
root run:

```bash
cargo build --release
```

This will produce the `trazaeo` library in `target/release`.

To run the unit tests execute:

```bash
cargo test
```

For full local quality gates (lint + type checks + tests), run from repo root:

```bash
make ci
```

To run coverage locally (Rust LCOV + Python coverage XML), run:

```bash
make coverage
```

To run Rust fuzz targets locally, install `cargo-fuzz` and run a target from
`trazaeo/fuzz/`:

```bash
cargo install cargo-fuzz
cargo fuzz run decode_range_proof_package --manifest-path trazaeo/fuzz/Cargo.toml
```

To install local commit-time checks (pre-commit parity with CI):

```bash
make precommit-install
```

The exact gate policy is documented in `docs/contracts/quality-gates.md`.

To run the streaming BLAKE3 performance harness:

```bash
cargo run --example perf_hashing -- <path-to-file> [chunk_size_bytes] [threads]
```

### Reliability examples (source, transport, and transform to reward readiness)

These examples support reliability validation for the V1 flow described in `TRAZAEO_V1_SPEC.md` sections 15, 8, and 12.

Rust retry + idempotency demo:

```bash
cargo run --example reliability_demo
```

Python file-root reliability check (after building Python bindings):

```bash
python -m trazaeo_workflows reliability-check <path-to-file> --chunk 1048576 --threads 4
```

Python netCDF content-root check:

```bash
python -m trazaeo_workflows hash-netcdf <path-to-file> --chunk 4096 --threads 4
```

Python source-device capture demo:

```bash
python -m trazaeo_workflows capture-source \
  --subject-id capture-source-1 \
  --capture-actor-id sensor-1 \
  --capture-system-id sensor-pipeline-1 \
  --output-ref obj://raw/1 \
  --segment-id frame-1 \
  --payload-text telemetry
```

Python transport-receipt capture demo:

```bash
python -m trazaeo_workflows capture-transport \
  --subject-id capture-transport-1 \
  --capture-actor-id ground-station-1 \
  --capture-system-id rx-1 \
  --input-ref uplink://pass-1 \
  --output-ref obj://relay/1 \
  --segment-id seg-transport-1 \
  --payload-text downlink-frame
```

Python publish+verify envelope demo:

```bash
TRUST_POLICY_JSON='{"allowed_keys":["18e6a97db14c236f52bb13ee7c843ee077ae77c43a37d2f8c548abd79036e599"],"revoked_keys":[],"audit_log":[{"action":"allow","key_id":"18e6a97db14c236f52bb13ee7c843ee077ae77c43a37d2f8c548abd79036e599","reason":"local demo trust policy","effective_at":"2026-01-01T00:00:00Z"}]}'
python -m trazaeo_workflows publish-demo --mode sampled --trust-policy-json "$TRUST_POLICY_JSON"
```

`publish-demo` prints one JSON object with `publish_input`, `publish_envelope`,
and `verification_report`.

Python adaptor demo with S3-style storage + public-RPC Solana proof log:

```bash
python -m trazaeo_workflows publish-solana --mode sampled --trust-policy-json "$TRUST_POLICY_JSON"
```

By default the demo uses `https://api.devnet.solana.com` with an ephemeral devnet
signer for local testing. For `solana-mainnet`, pass a funded Solana keypair file:

```bash
python -m trazaeo_workflows publish-solana \
  --cluster solana-mainnet \
  --rpc-url https://api.mainnet-beta.solana.com \
  --proof-log-keypair-path ~/.config/solana/id.json \
  --trust-policy-json "$TRUST_POLICY_JSON"
```

The memo-backed public-RPC proof-log adaptor verifies the committed transaction
and signer, but it does not expose a chain root, so the CLI reports `chain_root:
null` for that adaptor.

Python NC collection to Zarr/Icechunk conversion + verification demo:

```bash
python -m trazaeo_workflows icechunk \
  path/to/a.nc path/to/b.nc \
  --zarr-store outputs/sst.zarr \
  --dataset-id sst \
  --dataset-version v1 \
  --trust-policy-json "$TRUST_POLICY_JSON"
```

Jupyter notebook walkthrough for pre/post conversion visualization and verification:

```bash
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -e '.[notebooks]'
jupyter lab examples/python_netcdf/notebooks/nc_to_zarr_provenance_walkthrough.ipynb
```

This notebook install is self-contained for the walkthrough: it includes the
example runtime dependencies, `dask[array]`, and `ipykernel`.

### Documentation site

Install docs dependencies and run local dev mode:

```bash
cd website
npm install
npm run dev
```

Build the static docs site:

```bash
cd website
npm run build
```

### Python bindings

Python bindings are provided via `PyO3`. The easiest way to build them is with
`maturin`:

```bash
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install maturin
cd trazaeo
maturin develop --release --features python-extension,python-proof-log-rpc
```

After building you can import the `trazaeo` module from Python.

### Python dependencies for examples/tests

Install optional Python dependencies for netCDF examples and test tooling:

```bash
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -e '.[python-examples,test]'
```

## Example

Below is a minimal Rust example that hashes a file into a content descriptor:

```rust
use trazaeo::hashing::hash_file_content_descriptor;

fn main() {
    let descriptor = hash_file_content_descriptor(
        "data.bin",
        "artifact-1",
        1024,
        4,
        "application/octet-stream",
        "2026-01-01T00:00:00Z",
    )
    .expect("content descriptor");

    println!("content root: {}", descriptor.content_root_hash);
}
```

In Python you can call the provided hash helper after installing the editable package in your virtual environment.
Both single threaded and multithreaded variants are exposed:

```python
>>> from trazaeo import blake3_hash
>>> blake3_hash(b"hello world")
>>> from trazaeo import blake3_hash_mt
>>> blake3_hash_mt(b"hello world", 4)
```

### Optional Bao range proofs

`trazaeo` can generate Bao outboard data and byte-range proof packages internally,
so downstream apps do not need to bolt this on themselves.

This is an integrity feature, not a secrecy feature. In the current V1 model,
Bao verifies byte ranges against the BLAKE3 file hash recorded in the content
descriptor.

Bao support is optional and gated behind the Rust feature `bao-range-proofs`.
Default builds do not expose the Bao helpers.

```python
>>> from trazaeo import bao_outboard_json, bao_range_proof_package_json
>>> outboard_json = bao_outboard_json("example.nc", 4096, 4, None)
>>> proof_json = bao_range_proof_package_json("example.nc", 0, 4096, 4096, 4)
```

### Hashing a netCDF file with zero copy

The crate provides a helper to hash a file directly into a content descriptor
using memory mapping. From Python you can compute the content root of a netCDF
file as follows:

```python
>>> from trazaeo import blake3_content_root
>>> root = blake3_content_root("example.nc", 4096, 4)
>>> print(root.hex())
```

`blake3_content_root` reads the input using a zero-copy memory map to minimize
RAM usage.

