Metadata-Version: 2.4
Name: datasety
Version: 0.23.0
Summary: CLI tool for dataset preparation: resize, align, caption, shuffle, synthetic, mask, degrade, and character generation.
Project-URL: Homepage, https://github.com/kontextox/datasety
Project-URL: Repository, https://github.com/kontextox/datasety
Project-URL: Issues, https://github.com/kontextox/datasety/issues
Author: kontextox
License-Expression: MIT
License-File: LICENSE
Keywords: captioning,character,cli,dataset,degradation,diffusers,florence-2,image-editing,image-processing,ip-adapter,machine-learning,masking,segmentation,synthetic,upscaling
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Image Processing
Requires-Python: >=3.10
Requires-Dist: pillow>=9.0.0
Provides-Extra: all
Requires-Dist: accelerate; extra == 'all'
Requires-Dist: diffusers>=0.32.0; extra == 'all'
Requires-Dist: einops; extra == 'all'
Requires-Dist: insightface>=0.7.0; extra == 'all'
Requires-Dist: onnxruntime; extra == 'all'
Requires-Dist: pyyaml>=6.0; extra == 'all'
Requires-Dist: sam2>=1.0; extra == 'all'
Requires-Dist: sentencepiece; extra == 'all'
Requires-Dist: timm; extra == 'all'
Requires-Dist: torch>=2.0.0; extra == 'all'
Requires-Dist: transformers>=4.38.0; extra == 'all'
Requires-Dist: transformers>=4.45.0; extra == 'all'
Provides-Extra: caption
Requires-Dist: einops; extra == 'caption'
Requires-Dist: timm; extra == 'caption'
Requires-Dist: torch>=2.0.0; extra == 'caption'
Requires-Dist: transformers>=4.38.0; extra == 'caption'
Provides-Extra: character
Requires-Dist: accelerate; extra == 'character'
Requires-Dist: diffusers>=0.32.0; extra == 'character'
Requires-Dist: insightface>=0.7.0; extra == 'character'
Requires-Dist: onnxruntime; extra == 'character'
Requires-Dist: torch>=2.0.0; extra == 'character'
Requires-Dist: transformers>=4.38.0; extra == 'character'
Provides-Extra: degrade
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: mask
Requires-Dist: sam2>=1.0; extra == 'mask'
Requires-Dist: torch>=2.0.0; extra == 'mask'
Requires-Dist: transformers>=4.45.0; extra == 'mask'
Provides-Extra: synthetic
Requires-Dist: accelerate; extra == 'synthetic'
Requires-Dist: diffusers>=0.32.0; extra == 'synthetic'
Requires-Dist: sentencepiece; extra == 'synthetic'
Requires-Dist: torch>=2.0.0; extra == 'synthetic'
Requires-Dist: transformers>=4.38.0; extra == 'synthetic'
Provides-Extra: workflow
Requires-Dist: pyyaml>=6.0; extra == 'workflow'
Description-Content-Type: text/markdown

# datasety

<img align="right" src="https://raw.githubusercontent.com/kontextox/datasety/refs/heads/main/docs/public/mascot.png" alt="CLI tool for dataset preparation" width="120" />

[![PyPI](https://img.shields.io/pypi/v/datasety)](https://pypi.org/project/datasety/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)

CLI tool for dataset preparation — resize, caption, align, shuffle, synthetic editing, masking, degradation, character generation, and multi-step workflows.

[Full documentation →](https://kontextox.github.io/datasety/commands/workflow)

## Installation

```bash
pip install datasety                 # core (resize, align, shuffle, degrade)
pip install datasety[caption]        # + Florence-2 captioning
pip install datasety[synthetic]      # + image editing (FLUX, Qwen, SDXL)
pip install datasety[mask]           # + segmentation masks (SAM 3, CLIPSeg)
pip install datasety[character]      # + character dataset generation
pip install datasety[workflow]       # + YAML workflow support
pip install datasety[all]            # everything
```

---

## Commands

### `resize` — Resize & Crop Images

Batch resize images to exact dimensions with configurable crop positions.

<!-- screenshot: resize -->

```bash
datasety resize --input ./raw --output ./resized --resolution 768x1024 --crop-position top
```

<details>
<summary>Options</summary>

| Option                  | Description                                    | Default             |
| ----------------------- | ---------------------------------------------- | ------------------- |
| `--input`, `-i`         | Input directory                                | required\*          |
| `--output`, `-o`        | Output directory                               | required\*          |
| `--input-image`         | Single input image (alternative to dir mode)   |                     |
| `--output-image`        | Single output image (use with `--input-image`) |                     |
| `--resolution`, `-r`    | Target resolution (`WIDTHxHEIGHT`)             | required            |
| `--crop-position`       | `top`, `center`, `bottom`, `left`, `right`     | `center`            |
| `--input-format`        | Comma-separated input formats                  | `jpg,jpeg,png,webp` |
| `--output-format`       | `jpg`, `png`, `webp`                           | `jpg`               |
| `--output-name-numbers` | Rename output files to 1.jpg, 2.jpg, ...       | off                 |

</details>

```bash
# Single image
datasety resize --input-image photo.jpg --output-image resized.jpg -r 512x512

# Batch with sequential numbering
datasety resize -i ./photos -o ./dataset -r 1024x1024 --output-name-numbers --crop-position top
```

[Full documentation →](https://kontextox.github.io/datasety/commands/resize)

---

### `caption` — Generate Image Captions

Generate captions using Florence-2 (local) or OpenAI-compatible vision APIs.

<!-- screenshot: caption -->

```bash
datasety caption --input ./images --output ./captions --trigger-word "[trigger]"
```

<details>
<summary>Options</summary>

| Option               | Description                                 | Default                   |
| -------------------- | ------------------------------------------- | ------------------------- |
| `--input`, `-i`      | Input directory                             | required\*                |
| `--output`, `-o`     | Output directory for .txt files             | required\*                |
| `--input-image`      | Single input image                          |                           |
| `--output-caption`   | Single output .txt path                     |                           |
| `--device`           | `auto`, `cpu`, `cuda`                       | `auto`                    |
| `--trigger-word`     | Text to prepend to each caption             |                           |
| `--prompt`           | Florence-2 task prompt                      | `<MORE_DETAILED_CAPTION>` |
| `--model`            | HF model name or API model ID               |                           |
| `--num-beams`        | Beam search width (1 = greedy)              | `3`                       |
| `--florence-2-base`  | Use Florence-2-base (0.23B, faster)         | default                   |
| `--florence-2-large` | Use Florence-2-large (0.77B, more accurate) |                           |
| `--llm-api`          | Use OpenAI-compatible vision API            |                           |
| `--max-tokens`       | Max response tokens (API mode)              | `300`                     |
| `--temperature`      | Temperature (API mode)                      | `0.3`                     |

</details>

```bash
# Florence-2 with trigger word
datasety caption -i ./dataset -o ./dataset --trigger-word "photo of sks person," --device cuda

# OpenAI vision API (supports OPENAI_MODEL env var)
datasety caption -i ./images -o ./captions --llm-api --model gpt-4o
```

[Full documentation →](https://kontextox.github.io/datasety/commands/caption)

---

### `align` — Align Control/Target Pairs

Match dimensions, enforce multiples of 32, and unify formats for control/target training pairs.

<!-- screenshot: align -->

```bash
datasety align --target ./target --control ./control --dry-run
```

<details>
<summary>Options</summary>

| Option            | Description                              | Default       |
| ----------------- | ---------------------------------------- | ------------- |
| `--target`, `-t`  | Target images directory                  | required      |
| `--control`, `-c` | Control images directory                 | required      |
| `--multiple-of`   | Align dimensions to this multiple        | `32`          |
| `--output-format` | Convert all images: `jpg`, `png`, `webp` | keep original |
| `--dry-run`       | Preview changes without modifying files  | off           |

</details>

```bash
# Preview, then apply
datasety align -t ./target -c ./control --dry-run
datasety align -t ./target -c ./control --output-format jpg
```

[Full documentation →](https://kontextox.github.io/datasety/commands/align)

---

### `shuffle` — Random Caption Generation

Generate random captions by picking one variant from each text group.

<!-- screenshot: shuffle -->

```bash
datasety shuffle -i ./images -o ./captions \
    --group "A photo of a person.|Portrait of someone." \
    --group "Remove the hat.|Take off the hat."
```

<details>
<summary>Options</summary>

| Option                | Description                                | Default  |
| --------------------- | ------------------------------------------ | -------- |
| `--input`, `-i`       | Input directory containing images          | required |
| `--output`, `-o`      | Output directory for .txt files            | required |
| `--group`, `-g`       | Inline `\|`-separated, `.txt` file, or URL | required |
| `--separator`         | Separator between groups                   | `" "`    |
| `--seed`              | Random seed for reproducibility            |          |
| `--dry-run`           | Preview captions without writing           | off      |
| `--show-distribution` | Show caption distribution after generation | off      |

</details>

```bash
# Mix file, URL, and inline sources
datasety shuffle -i ./images -o ./captions \
    --group subjects.txt \
    --group "ending A|ending B" \
    --seed 42 --show-distribution
```

[Full documentation →](https://kontextox.github.io/datasety/commands/shuffle)

---

### `synthetic` — Synthetic Image Editing

Generate synthetic variations using image editing models (FLUX, Qwen, SDXL, LongCat, HunyuanImage).

<!-- screenshot: synthetic -->

```bash
datasety synthetic --input ./images --output ./synthetic --prompt "add a winter hat" --steps 4
```

<details>
<summary>Options</summary>

| Option              | Description                             | Default                             |
| ------------------- | --------------------------------------- | ----------------------------------- |
| `--input`, `-i`     | Input directory                         | required\*                          |
| `--output`, `-o`    | Output directory                        | required\*                          |
| `--input-image`     | Single input image                      |                                     |
| `--output-image`    | Single output image                     |                                     |
| `--prompt`, `-p`    | Edit instruction                        | required                            |
| `--model`           | Model (auto-detects family)             | `black-forest-labs/FLUX.2-klein-4B` |
| `--weights`         | Fine-tuned weights file                 |                                     |
| `--lora`            | LoRA adapter (repeatable, `:WEIGHT`)    |                                     |
| `--device`          | `auto`, `cpu`, `cuda`                   | `auto`                              |
| `--cpu-offload`     | Force CPU offload                       | auto                                |
| `--steps`           | Inference steps                         | `40`                                |
| `--cfg-scale`       | Guidance scale                          | `1.0`                               |
| `--true-cfg-scale`  | True CFG (Qwen only)                    | `4.0`                               |
| `--negative-prompt` | Negative prompt                         | `" "`                               |
| `--num-images`      | Images per input                        | `1`                                 |
| `--seed`            | Random seed                             |                                     |
| `--gguf`            | GGUF path/URL for quantized loading     |                                     |
| `--strength`        | Img2img strength (SDXL/FLUX.2, 0.0-1.0) | `0.7`                               |
| `--output-format`   | `png`, `jpg`, `webp`                    | `png`                               |

</details>

```bash
# Single image edit
datasety synthetic --input-image photo.jpg --output-image edited.png \
    --prompt "add sunglasses" --steps 4

# Qwen with LoRA
datasety synthetic -i ./dataset -o ./synthetic \
    --model "Qwen/Qwen-Image-Edit-2511" \
    --lora "adapter.safetensors:0.8" \
    --prompt "add a red scarf" --steps 40
```

[Full documentation →](https://kontextox.github.io/datasety/commands/synthetic)

---

### `mask` — Text-Prompted Segmentation Masks

Generate binary masks from images using text keywords. Supports SAM 3, Grounded SAM 2, and CLIPSeg.

<!-- screenshot: mask -->

```bash
datasety mask --input ./dataset --output ./masks --keywords "face,hair" --device cuda
```

<details>
<summary>Options</summary>

| Option             | Description                        | Default    |
| ------------------ | ---------------------------------- | ---------- |
| `--input`, `-i`    | Input directory                    | required\* |
| `--output`, `-o`   | Output directory for masks         | required\* |
| `--input-image`    | Single input image                 |            |
| `--output-image`   | Single output mask                 |            |
| `--keywords`, `-k` | Comma-separated keywords           | required   |
| `--model`          | `sam3`, `grounded-sam2`, `clipseg` | `sam3`     |
| `--device`         | `auto`, `cpu`, `cuda`              | `auto`     |
| `--threshold`      | Confidence threshold (0.0-1.0)     | `0.3`      |
| `--padding`        | Pixels to expand mask (dilation)   | `0`        |
| `--blur`           | Gaussian blur radius for edges     | `0`        |
| `--invert`         | Invert mask colors                 | off        |
| `--naming`         | `folder` or `suffix` (`_mask`)     | `folder`   |
| `--output-format`  | `png`, `jpg`, `webp`               | `png`      |
| `--dry-run`        | Preview detections without saving  | off        |

</details>

```bash
# CLIPSeg (lightweight, no extra deps)
datasety mask -i ./dataset -o ./masks -k "face" --model clipseg --threshold 0.5

# Grounded SAM 2 with mask refinement
datasety mask -i ./dataset -o ./masks -k "hat,glasses" --model grounded-sam2 --padding 5 --blur 3
```

[Full documentation →](https://kontextox.github.io/datasety/commands/mask)

---

### `degrade` — Image Degradation

Create degraded versions of images for upscale/enhance training. Pure Pillow, no extra dependencies.

<!-- screenshot: degrade -->

```bash
datasety degrade --input ./originals --output ./dataset --type random --intensity-range 0.2-0.8 --paired
```

<details>
<summary>Options</summary>

| Option              | Description                           | Default    |
| ------------------- | ------------------------------------- | ---------- |
| `--input`, `-i`     | Input directory                       | required\* |
| `--output`, `-o`    | Output directory                      | required\* |
| `--input-image`     | Single input image                    |            |
| `--output-image`    | Single output image                   |            |
| `--type`, `-t`      | Degradation type(s), repeatable       | `random`   |
| `--intensity`       | Global intensity (0.0-1.0)            | `0.5`      |
| `--intensity-range` | Random range `MIN-MAX`                |            |
| `--chain`           | Apply multiple types sequentially     | off        |
| `--num-variants`    | Variants per input image              | `1`        |
| `--paired`          | Create `control/` + `target/` subdirs | off        |
| `--seed`            | Random seed                           |            |
| `--output-format`   | `png`, `jpg`, `webp`                  | `png`      |

**Degradation types:** `lowres`, `oversharpen`, `noise`, `blur`, `jpeg`, `motion-blur`, `pixelate`, `color-bands`, `upscale-sim`, `random`

</details>

```bash
# Chain specific degradations for paired output
datasety degrade -i ./images -o ./dataset --type jpeg --type noise --chain --paired --seed 42

# Multiple random variants per image
datasety degrade -i ./images -o ./degraded --type random --num-variants 3 --intensity-range 0.3-0.8
```

[Full documentation →](https://kontextox.github.io/datasety/commands/degrade)

---

### `character` — Character Dataset Generation

Generate identity-preserving character datasets from reference face images using LLM prompts + IP-Adapter.

<!-- screenshot: character -->

```bash
datasety character --reference face.jpg --output ./dataset --llm-ollama llama3.2 --num-images 20
```

<details>
<summary>Options</summary>

| Option                    | Description                             | Default                        |
| ------------------------- | --------------------------------------- | ------------------------------ |
| `--reference`, `-r`       | Reference face image(s)                 | required                       |
| `--output`, `-o`          | Output directory                        | required                       |
| `--num-images`, `-n`      | Number of images to generate            | `10`                           |
| `--model`                 | Base model for generation               | `black-forest-labs/FLUX.1-dev` |
| `--ip-adapter`            | IP-Adapter model                        | auto-detected                  |
| `--ip-adapter-scale`      | Conditioning strength (0.0-1.0)         | `0.6`                          |
| `--character-description` | Text description of the character       |                                |
| `--style`                 | Style guidance (e.g., `photorealistic`) |                                |
| `--prompts-only`          | Only generate prompts, skip images      | off                            |
| `--prompts-file`          | Load prompts from file instead of LLM   |                                |
| `--llm-api`               | Use OpenAI-compatible API               |                                |
| `--llm-ollama MODEL`      | Use local Ollama server                 |                                |
| `--llm-gguf PATH`         | Use local GGUF model                    |                                |
| `--llm-model REPO`        | Use HuggingFace model                   |                                |
| `--device`                | `auto`, `cpu`, `cuda`                   | `auto`                         |
| `--steps`                 | Inference steps                         | `28`                           |
| `--cfg-scale`             | Guidance scale                          | `3.5`                          |
| `--seed`                  | Random seed                             |                                |
| `--output-format`         | `png`, `jpg`, `webp`                    | `png`                          |

</details>

```bash
# Generate with OpenAI API
datasety character -r face1.jpg face2.jpg -o ./dataset \
    --llm-api --num-images 20 --style "photorealistic"

# Preview prompts only
datasety character -r face.jpg -o ./dataset --llm-ollama llama3.2 --prompts-only
```

[Full documentation →](https://kontextox.github.io/datasety/commands/character)

---

### `workflow` — Multi-Step Pipelines

Run multi-step datasety pipelines from YAML or JSON files with dry-run validation.

<!-- screenshot: workflow -->

```bash
datasety workflow --file datasety.yaml --dry-run
```

<details>
<summary>Options</summary>

| Option         | Description                      | Default     |
| -------------- | -------------------------------- | ----------- |
| `--file`, `-f` | Path to workflow file            | auto-detect |
| `--dry-run`    | Validate steps without executing | off         |

</details>

Create `datasety.yaml`:

```yaml
steps:
  - command: resize
    args:
      input: ./raw
      output: ./resized
      resolution: 768x1024
  - command: caption
    args:
      input: ./resized
      output: ./resized
      llm-api: true
      model: gpt-4o
```

```bash
# Validate first, then execute
datasety workflow --dry-run
datasety workflow
```

[Full documentation →](https://kontextox.github.io/datasety/commands/workflow)

---

## License

MIT
