Metadata-Version: 2.4
Name: fancy-dask
Version: 0.1.1
Summary: Chunk-aware fancy indexing for dask arrays backed by zarr
Author-email: Sourabh Palande <sourabh.palande@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/sourabhpalande/fancy-dask
Project-URL: Documentation, https://github.com/sourabhpalande/fancy-dask/tree/main/docs
Project-URL: Repository, https://github.com/sourabhpalande/fancy-dask
Project-URL: Issues, https://github.com/sourabhpalande/fancy-dask/issues
Project-URL: Changelog, https://github.com/sourabhpalande/fancy-dask/blob/main/CHANGELOG.md
Keywords: dask,zarr,fancy-indexing,numpy,napari
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: dask[array]<2027.0,>=2025.9
Requires-Dist: zarr<4.0,>=3.1
Requires-Dist: numpy<3.0,>=2.0
Provides-Extra: napari
Requires-Dist: napari[all]<1.0,>=0.6; extra == "napari"
Provides-Extra: dev
Requires-Dist: pytest<10.0,>=8.0; extra == "dev"
Requires-Dist: pytest-cov<8.0,>=6.0; extra == "dev"
Requires-Dist: pytest-benchmark<6.0,>=5.0; extra == "dev"
Requires-Dist: pytest-timeout<3.0,>=2.3; extra == "dev"
Requires-Dist: mypy<2.0,>=1.8; extra == "dev"
Requires-Dist: ruff<1.0,>=0.8; extra == "dev"
Requires-Dist: build<2.0,>=1.2; extra == "dev"
Requires-Dist: twine<7.0,>=6.0; extra == "dev"
Provides-Extra: all
Requires-Dist: napari[all]<1.0,>=0.6; extra == "all"
Requires-Dist: pytest<10.0,>=8.0; extra == "all"
Requires-Dist: pytest-cov<8.0,>=6.0; extra == "all"
Requires-Dist: pytest-benchmark<6.0,>=5.0; extra == "all"
Requires-Dist: pytest-timeout<3.0,>=2.3; extra == "all"
Requires-Dist: mypy<2.0,>=1.8; extra == "all"
Requires-Dist: ruff<1.0,>=0.8; extra == "all"
Requires-Dist: build<2.0,>=1.2; extra == "all"
Requires-Dist: twine<7.0,>=6.0; extra == "all"
Dynamic: license-file

# fancy-dask

**Chunk-aware fancy indexing for dask arrays backed by zarr.**

Dask arrays don't natively support multi-axis fancy indexing operations that NumPy users rely on. This package provides a wrapper to bridge the gap. `fancy-dask` provides transparent NumPy-style fancy indexing (`__getitem__` / `__setitem__` with integer-array and boolean-mask indices) for `dask.Array` objects backed by zarr stores — a combination that vanilla dask does not support natively.

The primary use-case is interactive annotation tools such as [napari](https://napari.org/), where a labels layer holds a large N-D zarr array and the paint tool issues pixel-coordinate fancy indices that must be read and written efficiently without loading the entire array into memory. Other use-cases include scientific analyses with irregular sampling patterns or sparse/masked data access in large datasets, or any workflows where users want the expressiveness of NumPy indexing on large out-of-core arrays.

---

## Features

- **Fancy reads stay inside the dask graph** — each touched dask block is accessed via `array.blocks[coord]`, preserving lazy evaluation, task fusion, and parallel scheduling.
- **Chunk-coalesced writes** — fancy indices are grouped by zarr chunk; each chunk is read once, modified in memory, then written back. No redundant I/O.
- **Parallel writes** — disjoint zarr chunks are written via `ThreadPoolExecutor`.
- **Full NumPy index semantics** — integer arrays, boolean masks (1-D and N-D), slices, integers, ellipsis, and `np.newaxis` are all supported, including multi-axis fancy indexing.
- **Napari integration** — `FancyDaskLabelsLayer` extends `FancyDask` with debounced, stroke-batched writes, undo (configurable via `max_undo_steps`), `preserve_labels` support, and read-after-write consistency mode for interactive annotation workflows.

---

## Requirements

| Package | Version |
|---------|---------|
| Python  | ≥ 3.11 (3.13 recommended) |
| zarr    | ≥ 3.1   |
| dask    | ≥ 2025.9 |
| numpy   | ≥ 2.0   |
| napari[all]  | >=0.6 (optional, for `napari_integration`, can work with `napari[pyqt6] >=0.6`) |

---

## Installation

### From PyPI
```bash
# Core install
pip install fancy-dask
# With napari (for FancyDaskLabelsLayer)
pip install fancy-dask[napari]
# With dev/test dependencies
pip install fancy-dask[dev]
# With all dependencies
pip install fancy-dask[all]
```

### From source
```bash
git clone https://github.com/sourabhpalande/fancy-dask.git
cd fancy-dask
pip install . # or .[napari], .[dev], .[all] for extras
# For editable install (recommended for development)
pip install -e . # or .[napari], .[dev], .[all] for extras
```

---

## Quick Start

### Reads

```python
import dask.array as da
import numpy as np
from fancy_dask import FancyDask

# Open a zarr-backed dask array
dask_array = da.from_zarr("path/to/data.zarr")
fancy = FancyDask(dask_array)

# Slice — forwarded to dask directly (zero overhead)
block = fancy[100:200, :, :]            # lazy da.Array

# Integer fancy index along axis 0
rows = np.array([0, 50, 100, 500, 999])
result = fancy[rows, :]                 # lazy da.Array, shape (5, ...)
values = result.compute()               # trigger I/O

# Boolean mask
mask = some_array > threshold
result = fancy[mask, :]                 # 1-D boolean mask on axis 0

# Multi-axis fancy index (element-wise, not outer product)
row_idx = np.array([0, 10, 20])
col_idx = np.array([5, 15, 25])
result = fancy[row_idx, col_idx, :]     # shape (3, depth)
```

### Writes

```python
# Fancy write — chunk-coalesced, parallel
fancy[rows, :] = np.full((len(rows), width, depth), fill_value=42)

# Scalar fill
fancy[row_idx, col_idx, :] = 0
```

### Napari Integration

```python
import napari
import dask.array as da
from fancy_dask.napari_integration import FancyDaskLabelsLayer

labels = da.from_zarr("segmentation.zarr")
fancy_labels = FancyDaskLabelsLayer(labels, debounce_ms=100)

viewer = napari.Viewer()
layer = viewer.add_labels(fancy_labels, name="Labels")

# Connect auto stroke-batching and preserve_labels sync
fancy_labels.connect_to_layer(layer)
```

### Workaround for Non-Zarr-Backed Dask Arrays

The `fancy-dask` implementation requires Dask arrays to be backed by Zarr storage. It will not work with Dask arrays backed by other storage types (e.g., NumPy, HDF5, or in-memory arrays). But you can convert these arrays to Zarr format to use `fancy-dask` features. For example, for an HDF5-backed Dask array:

```python
import dask.array as da
import h5py
# Open HDF5 file (requires h5py)
hdf5_arr = da.from_array(h5py.File('data.hdf5')['dataset'], chunks=(100, 100))
# Convert to Zarr
hdf5_arr.to_zarr('mydata.zarr')
zarr_backed = da.from_zarr('mydata.zarr')
```

However, memory and disk space availability must be considered when using this workaround.

---

## How It Works

### Reads

```
FancyDask.__getitem__(fancy_indices)
    ↓
_preprocess_indices()          # normalise / classify
    ↓
_map_fancy_to_dask_blocks()    # searchsorted → per-block groups
    ↓
for each BlockGroup:
    block = array.blocks[coord]            # lazy da.Array
    part  = block[local_index_tuple]       # lazy, dask-native
    ↓
da.concatenate(parts, axis=fancy_axis)    # lazy result
    ↓ (if unsorted)
result[np.argsort(output_positions)]      # restore input order
```

### Writes

```
FancyDask.__setitem__(fancy_indices, value)
    ↓
_map_fancy_to_zarr_chunks()    # group indices by zarr chunk
    ↓
for each ZarrChunkGroup (parallel via ThreadPoolExecutor):
    chunk = zarr_array[zarr_slice]         # one zarr read
    chunk[local_index_tuple] = value[...]  # numpy assignment
    zarr_array[zarr_slice] = chunk         # one zarr write
```

See [docs/design.md](docs/design.md) for the full architecture and [docs/performance_breakdown.md](docs/performance_breakdown.md) for benchmark results.

---

## Documentation

| File | Description |
|------|-------------|
| [docs/fancy_dask.md](docs/fancy_dask.md) | `FancyDask` class reference, usage examples, optimizations, limitations |
| [docs/napari_integration.md](docs/napari_integration.md) | `FancyDaskLabelsLayer` reference, napari usage, debounce tuning |
| [docs/design.md](docs/design.md) | Architecture and data-structure design |
| [docs/performance_breakdown.md](docs/performance_breakdown.md) | Benchmark results and analysis |
| [docs/future_optimizations.md](docs/future_optimizations.md) | Known bottlenecks and planned improvements |

---

## Examples

| Script | Description |
|--------|-------------|
| [examples/usage_basic.py](examples/usage_basic.py) | Core read/write API walkthrough |
| [examples/usage_napari.py](examples/usage_napari.py) | Napari labels layer integration |
| [examples/usage_zarr_workaround.py](examples/usage_zarr_workaround.py) | Zarr workaround patterns |

Run any example directly:

```bash
python examples/usage_basic.py
```

---

## Running Tests

Test zarr fixtures under `tests/test_data/` are created automatically when missing
or invalid, so you don't need to commit large `.zarr` datasets to run the suite.

```bash
# Unit tests
pytest tests/unit/

# Performance benchmarks
pytest tests/performance/ --benchmark-only

# All tests with coverage
pytest --cov=fancy_dask tests/
```

---

## License

MIT

---

## Citation

If you use this package in your research, please cite:

```bibtex
@software{fancy_dask,
  author = {Palande, Sourabh},
  title = {fancy-dask: Efficient Fancy Indexing Wrapper for Dask Arrays backed by Zarr Storage},
  year = {2026},
  url = {https://github.com/sourabhpalande/fancy-dask}
}
```
