Metadata-Version: 2.4
Name: vaepack
Version: 0.1.0
Summary: VAE-based image compression using latent space quantization and JPEG XL
License-Expression: MIT
Project-URL: Homepage, https://github.com/DVA305-VT26-Grupp4/Projektarbete
Project-URL: Repository, https://github.com/DVA305-VT26-Grupp4/Projektarbete
Project-URL: Issues, https://github.com/DVA305-VT26-Grupp4/Projektarbete/issues
Keywords: vae,image-compression,latent-space,jpeg-xl,sdxl
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Multimedia :: Graphics
Classifier: Topic :: Scientific/Engineering :: Image Processing
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: numpy
Requires-Dist: Pillow
Requires-Dist: imagecodecs
Requires-Dist: scikit-image
Requires-Dist: torch
Provides-Extra: diffusers
Requires-Dist: diffusers; extra == "diffusers"
Requires-Dist: transformers; extra == "diffusers"
Requires-Dist: accelerate; extra == "diffusers"
Requires-Dist: safetensors; extra == "diffusers"

# vaepack

VAE-based image compression using latent space quantization and JPEG XL.

**vaepack** encodes images into the latent space of a Variational Autoencoder (SDXL), quantizes the latents to configurable bit-depth, and compresses the result with JPEG XL — all packed into a compact `.zvae` container.

> **Testing purposes only — very alpha.** This library is experimental and not intended for production use. APIs and behaviour may change without notice.

## Installation

### From PyPI (when published)

```bash
pip install vaepack
```

### From source

```bash
git clone https://github.com/DVA305-VT26-Grupp4/Projektarbete.git
cd Projektarbete
pip install -e ".[diffusers]"
```

> **Note:** PyTorch with CUDA support should be installed separately before
> installing vaepack. See [pytorch.org](https://pytorch.org/get-started/locally/)
> for platform-specific instructions.

### Dependencies

| Package | Purpose |
|---------|---------|
| `torch` | VAE inference (CPU or CUDA) |
| `numpy` | Array operations |
| `Pillow` | Image I/O |
| `imagecodecs` | JPEG XL encoding/decoding |
| `scikit-image` | PSNR and SSIM metrics |

Optional (for VAE model loading from HuggingFace):

| Package | Purpose |
|---------|---------|
| `diffusers` | AutoencoderKL model class |
| `transformers` | HuggingFace model infrastructure |
| `accelerate` | Efficient model loading |
| `safetensors` | Safe weight file format |

Install optional dependencies with:

```bash
pip install "vaepack[diffusers]"
```

## Quick start

### Python API (notebooks / scripts)

```python
from vaepack import compress, decompress

# Compress an image
state = compress("photo.png", "photo.zvae", quant_bits=10, metrics=True)
print(f"PSNR: {state.metrics['psnr_db']:.2f} dB")
print(f"SSIM: {state.metrics['ssim']:.4f}")

# Decompress
state = decompress("photo.zvae", "recon.png")
```

### Command line

```bash
# Compress
zipvae compress photo.png -o photo.zvae --quant-bits 10 --metrics

# Decompress with quality metrics
zipvae decompress photo.zvae -o recon.png --reference photo.png --metrics
```

## API reference

### High-level functions

The recommended entry points for most users. Import directly from `vaepack`:

```python
from vaepack import compress, decompress
```

#### `compress(input_path, output_path=None, *, quant_bits=10, quant_clip=4.0, jxl_distance=0.0, jxl_effort=7, device="auto", metrics=False) -> PipelineState`

Compress an image to a `.zvae` file.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `input_path` | `str \| Path` | required | Source image (any Pillow-supported format) |
| `output_path` | `str \| Path \| None` | `None` | Output path (defaults to `<input>.zvae`) |
| `quant_bits` | `int` | `10` | Quantization bit-depth (4–16) |
| `quant_clip` | `float` | `4.0` | Symmetric clipping range; `0` for auto-detection |
| `jxl_distance` | `float` | `0.0` | JXL perceptual distance; `0.0` = lossless |
| `jxl_effort` | `int` | `7` | JXL encoder effort (1–9) |
| `device` | `str` | `"auto"` | `"auto"`, `"cpu"`, or `"cuda"` |
| `metrics` | `bool` | `False` | Compute PSNR/SSIM after encoding |

Returns a `PipelineState` with all intermediate results.

#### `decompress(input_path, output_path=None, *, reference=None, device="auto", metrics=False) -> PipelineState`

Decompress a `.zvae` file to a PNG image.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `input_path` | `str \| Path` | required | `.zvae` file to decompress |
| `output_path` | `str \| Path \| None` | `None` | Output path (defaults to `<input>.png`) |
| `reference` | `str \| Path \| None` | `None` | Original image for metric computation |
| `device` | `str` | `"auto"` | `"auto"`, `"cpu"`, or `"cuda"` |
| `metrics` | `bool` | `False` | Compute PSNR/SSIM (requires `reference`) |

### Stage-based pipeline

For full control over each step. All stages operate on a shared `PipelineState` dataclass:

```python
from vaepack import (
    PipelineState, load_vae_by_name, CODEC_REGISTRY, resolve_device,
    stage_load_image, stage_vae_encode, stage_quantize,
    stage_codec_encode, stage_codec_decode, stage_vae_decode,
    write_zvae, compute_all_metrics,
)

device = resolve_device("auto")
vae = load_vae_by_name("sdxl", device)
codec = CODEC_REGISTRY["jxl"]()

# Compress
state = PipelineState()
stage_load_image("photo.png", state)
stage_vae_encode(vae, device, state)
stage_quantize(10, 4.0, state)
stage_codec_encode(codec, state, lossless=True, effort=7)

# Decompress
stage_codec_decode(codec, state,
    base_bytes=state.compressed_bytes,
    quant_bits=state.quant_bits,
    quant_clip=state.quant_clip)
stage_vae_decode(vae, device, state)

# Evaluate
metrics = compute_all_metrics(state.original_u8, state.reconstructed_u8)
```

#### Pipeline stages

| Stage | Input fields | Output fields |
|-------|-------------|---------------|
| `stage_load_image(path, state)` | — | `input_image`, `original_u8`, dimensions |
| `stage_vae_encode(vae, device, state)` | `input_image` | `latent_fp32`, `latent_shape` |
| `stage_quantize(bits, clip, state)` | `latent_fp32` | `quantized`, `quant_bits`, `quant_clip` |
| `stage_codec_encode(codec, state, ...)` | `quantized` | `compressed_bytes`, `compressed_size_bytes` |
| `stage_codec_decode(codec, state, ...)` | — (takes bytes arg) | `dequantized_f32` |
| `stage_vae_decode(vae, device, state)` | `dequantized_f32` | `reconstructed_u8` |

All stages record timing in `state.timing` (keys like `"vae_encode_ms"`).

### `PipelineState`

Dataclass holding all intermediate data through the pipeline:

```python
@dataclass
class PipelineState:
    # Input
    input_path: Path | None           # Source file path
    input_image: Image.Image | None   # PIL image (RGB)
    original_u8: np.ndarray | None    # (H, W, 3) uint8

    # Encoding
    latent_fp32: np.ndarray | None    # (4, Lh, Lw) float32

    # Quantization
    quantized: np.ndarray | None      # (4, Lh, Lw) uint8 or uint16
    quant_bits: int                   # Bit-depth used
    quant_clip: float                 # Clipping range used

    # Codec
    compressed_bytes: bytes | None    # Raw compressed payload
    compressed_size_bytes: int        # Payload size

    # Decoding
    dequantized_f32: np.ndarray | None  # (4, Lh, Lw) float32
    reconstructed_u8: np.ndarray | None # (H, W, 3) uint8

    # Metadata
    metrics: dict[str, float]         # Quality metrics
    timing: dict[str, float]          # Per-stage timing (ms)
    latent_shape: tuple[int, ...]     # (C, Lh, Lw)
    original_width: int
    original_height: int
```

### Quantization

```python
from vaepack import quantize_latent, dequantize_latent

# Quantize float32 latents to 10-bit integers
quantized = quantize_latent(latent_fp32, bits=10, clip=4.0)  # -> uint16

# Dequantize back
recovered = dequantize_latent(quantized, bits=10, clip=4.0)  # -> float32
```

Values are clipped to `[-clip, clip]` and linearly mapped to `[0, 2^bits - 1]`. Output dtype is `uint8` for bits <= 8, `uint16` for bits > 8.

### Metrics

```python
from vaepack import psnr, ssim, compute_all_metrics

p = psnr(original_u8, reconstructed_u8)          # -> float (dB)
s = ssim(original_u8, reconstructed_u8)           # -> float [0, 1]
m = compute_all_metrics(original_u8, recon_u8)    # -> {"psnr_db": ..., "ssim": ...}
```

All functions expect `(H, W, 3)` uint8 arrays.

### Codecs

Pluggable codec system with a registry pattern:

```python
from vaepack import CodecBase, JXLCodec, CODEC_REGISTRY

codec = CODEC_REGISTRY["jxl"]()
compressed = codec.encode(arr, lossless=True, effort=7, quant_bits=10)
decoded = codec.decode(compressed)
```

To add a custom codec, subclass `CodecBase` and register it:

```python
class MyCodec(CodecBase):
    def encode(self, arr, **kwargs): ...
    def decode(self, data): ...

CODEC_REGISTRY["mycodec"] = MyCodec
```

### Container format

The `.zvae` binary container:

```python
from vaepack import write_zvae, read_zvae, MAGIC, FORMAT_VERSION

# Write
write_zvae("out.zvae", header_dict, compressed_bytes, residual_bytes=None)

# Read
header, base_bytes, residual_bytes = read_zvae("out.zvae")
```

Binary layout:

```
Offset   Size      Description
0        8 B       Magic: b"ZVAE0001"
8        4 B       Header length (uint32 LE)
12       N B       JSON header (UTF-8)
12+N     4 B       Base payload length (uint32 LE)
16+N     M B       Base payload
16+N+M   4 B       (optional) Residual length
...      ...       (optional) Residual payload
```

### VAE loading

```python
from vaepack import resolve_device, load_vae_by_name, VAE_REGISTRY

device = resolve_device("auto")       # "cuda" if available, else "cpu"
vae = load_vae_by_name("sdxl", device) # Downloads from HuggingFace on first call
```

The `diffusers` package is imported lazily — `vaepack` can be imported without it for tasks that don't need the VAE (e.g. inspecting `.zvae` headers).

## CLI reference

Installed as `zipvae` after `pip install`:

### `zipvae compress`

```
zipvae compress INPUT [-o OUTPUT] [--quant-bits {4..16}] [--quant-clip FLOAT]
                      [--jxl-distance FLOAT] [--jxl-effort INT]
                      [--device {auto,cpu,cuda}] [--metrics]
```

| Flag | Default | Description |
|------|---------|-------------|
| `-o, --output` | `<input>.zvae` | Output file path |
| `--quant-bits` | `8` | Quantization bit-depth (4–16) |
| `--quant-clip` | `4.0` | Clipping range (0 = auto) |
| `--jxl-distance` | `0.0` | 0.0 = lossless |
| `--jxl-effort` | `7` | Encoder effort (1–9) |
| `--device` | `auto` | Compute device |
| `--metrics` | off | Print PSNR/SSIM |

### `zipvae decompress`

```
zipvae decompress INPUT [-o OUTPUT] [--device {auto,cpu,cuda}]
                        [--metrics] [--reference PATH]
```

| Flag | Default | Description |
|------|---------|-------------|
| `-o, --output` | `<input>.png` | Output image path |
| `--device` | `auto` | Compute device |
| `--metrics` | off | Print PSNR/SSIM |
| `--reference` | — | Original image for metrics |

## Architecture

```
vaepack/
  __init__.py     Public API exports and package docstring
  api.py          High-level compress() / decompress()
  cli.py          CLI entry point (zipvae command)
  pipeline.py     PipelineState dataclass and 6 stage functions
  quantize.py     Uniform scalar quantization / dequantization
  codecs.py       Codec base class, JXL implementation, registry
  container.py    .zvae binary container format I/O
  metrics.py      PSNR and SSIM computation
  vae.py          VAE model loading and device resolution
```

### Design principles

- **Composable stages:** Each pipeline stage is a pure function that reads from and writes to a shared `PipelineState`. Stages can be used independently for experimentation.
- **Pluggable backends:** Codecs and VAE models are registered in dictionaries, making it straightforward to add new backends (e.g. AVIF codec, TAESD VAE).
- **Lazy imports:** Heavy dependencies like `diffusers` are imported only when needed, keeping `import vaepack` fast.
- **Notebook-first API:** The `compress()` / `decompress()` functions are designed for interactive use — simple calls with keyword arguments and rich return values.

## CI/CD

GitHub Actions builds the LaTeX report (XeLaTeX in Docker) and deploys the PDF to [GitHub Pages](https://dva305-vt26-grupp4.github.io/Projektarbete/projarb-latest.pdf) on push to main.

## License

MIT
