Metadata-Version: 2.4
Name: castkit
Version: 0.1.1
Summary: Universal model quantization and format conversion CLI
Project-URL: Repository, https://github.com/schroneko/castkit
Project-URL: Issues, https://github.com/schroneko/castkit/issues
Author: schroneko
License-Expression: MIT
License-File: LICENSE
Keywords: awq,gguf,gptq,llm,mlx,model-conversion,quantization
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.12
Requires-Dist: huggingface-hub>=0.25
Requires-Dist: rich>=13
Requires-Dist: typer>=0.15
Provides-Extra: all
Requires-Dist: accelerate>=1.0; extra == 'all'
Requires-Dist: datasets>=3.0; extra == 'all'
Requires-Dist: gguf>=0.10; extra == 'all'
Requires-Dist: gptqmodel>=2.0; extra == 'all'
Requires-Dist: mlx-lm>=0.20; extra == 'all'
Requires-Dist: mlx>=0.22; extra == 'all'
Requires-Dist: numpy>=1.26; extra == 'all'
Requires-Dist: onnx>=1.16; extra == 'all'
Requires-Dist: optimum[onnxruntime]>=1.20; extra == 'all'
Requires-Dist: safetensors>=0.4; extra == 'all'
Requires-Dist: sentencepiece>=0.2; extra == 'all'
Requires-Dist: torch>=2.4; extra == 'all'
Requires-Dist: transformers>=4.45; extra == 'all'
Provides-Extra: dev
Requires-Dist: pytest>=8; extra == 'dev'
Requires-Dist: ruff>=0.9; extra == 'dev'
Provides-Extra: exl2
Requires-Dist: datasets>=3.0; extra == 'exl2'
Requires-Dist: exllamav2>=0.2; extra == 'exl2'
Requires-Dist: safetensors>=0.4; extra == 'exl2'
Requires-Dist: torch>=2.4; extra == 'exl2'
Requires-Dist: transformers>=4.45; extra == 'exl2'
Provides-Extra: exl3
Requires-Dist: datasets>=3.0; extra == 'exl3'
Requires-Dist: exllamav3>=0.0.18; extra == 'exl3'
Requires-Dist: safetensors>=0.4; extra == 'exl3'
Requires-Dist: torch>=2.4; extra == 'exl3'
Requires-Dist: transformers>=4.45; extra == 'exl3'
Provides-Extra: gguf
Requires-Dist: accelerate>=1.0; extra == 'gguf'
Requires-Dist: datasets>=3.0; extra == 'gguf'
Requires-Dist: gguf>=0.10; extra == 'gguf'
Requires-Dist: numpy>=1.26; extra == 'gguf'
Requires-Dist: safetensors>=0.4; extra == 'gguf'
Requires-Dist: sentencepiece>=0.2; extra == 'gguf'
Requires-Dist: torch>=2.4; extra == 'gguf'
Requires-Dist: transformers>=4.45; extra == 'gguf'
Provides-Extra: gptq
Requires-Dist: accelerate>=1.0; extra == 'gptq'
Requires-Dist: datasets>=3.0; extra == 'gptq'
Requires-Dist: gptqmodel>=2.0; extra == 'gptq'
Requires-Dist: safetensors>=0.4; extra == 'gptq'
Requires-Dist: torch>=2.4; extra == 'gptq'
Requires-Dist: transformers>=4.45; extra == 'gptq'
Provides-Extra: mlx
Requires-Dist: mlx-lm>=0.20; extra == 'mlx'
Requires-Dist: mlx>=0.22; extra == 'mlx'
Requires-Dist: numpy>=1.26; extra == 'mlx'
Provides-Extra: onnx
Requires-Dist: numpy>=1.26; extra == 'onnx'
Requires-Dist: onnx>=1.16; extra == 'onnx'
Requires-Dist: optimum[onnxruntime]>=1.20; extra == 'onnx'
Requires-Dist: safetensors>=0.4; extra == 'onnx'
Requires-Dist: transformers>=4.45; extra == 'onnx'
Provides-Extra: transformers
Requires-Dist: accelerate>=1.0; extra == 'transformers'
Requires-Dist: datasets>=3.0; extra == 'transformers'
Requires-Dist: safetensors>=0.4; extra == 'transformers'
Requires-Dist: torch>=2.4; extra == 'transformers'
Requires-Dist: transformers>=4.45; extra == 'transformers'
Description-Content-Type: text/markdown

# castkit

[![PyPI](https://img.shields.io/pypi/v/castkit)](https://pypi.org/project/castkit/)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Python 3.12+](https://img.shields.io/badge/python-3.12%2B-blue)](https://www.python.org/downloads/)

castkit is a CLI tool for model quantization and format conversion across GGUF, MLX, GPTQ, AWQ, ONNX, EXL2, and EXL3 workflows, including cross-format conversion via automatic FP16 decast.

## Requirements

- Python 3.12+
- Apple Silicon Mac for MLX backend
- NVIDIA GPU + CUDA for GPTQ/AWQ convert
- NVIDIA GPU + CUDA for EXL2/EXL3 convert and measure
- ONNX Runtime for ONNX backend
- llama.cpp (`llama-quantize`, `convert_hf_to_gguf.py`) for GGUF convert

## Installation

### Homebrew (macOS)

```bash
brew install schroneko/tap/castkit
```

### pip / uv

```bash
uv tool install castkit          # core only
uv tool install castkit[mlx]     # MLX backend (Apple Silicon)
uv tool install castkit[gguf]    # GGUF backend (requires torch)
uv tool install castkit[onnx]    # ONNX backend
uv tool install castkit[exl2]    # EXL2 backend (CUDA required)
uv tool install castkit[exl3]    # EXL3 backend (CUDA required)
uv tool install castkit[all]     # all backends
```

## Quick Start

```bash
# version
castkit --version

# convert to GGUF
castkit convert Qwen/Qwen3-0.6B -f gguf -q q4_k_m -o ./output/Qwen3-0.6B.gguf

# convert to MLX 4-bit
castkit convert Qwen/Qwen3-0.6B -f mlx -b 4 -o ./output/Qwen3-0.6B-mlx-4bit

# convert to ONNX
castkit convert Qwen/Qwen3-0.6B -f onnx -o ./output/Qwen3-0.6B-onnx

# convert to EXL2 5bpw
castkit convert Qwen/Qwen3-0.6B -f exl2 -q exl2-5.0 -o ./output/Qwen3-0.6B-exl2

# convert to EXL3 4bpw
castkit convert Qwen/Qwen3-0.6B -f exl3 -q exl3-4.0 -o ./output/Qwen3-0.6B-exl3

# decast (dequantize back to FP16 SafeTensors)
castkit decast ./output/Qwen3-0.6B.gguf -o ./output/Qwen3-0.6B-fp16

# cross-format conversion (GGUF -> GPTQ via automatic FP16 decast)
castkit convert ./output/Qwen3-0.6B.gguf -f gptq -b 4 -o ./output/Qwen3-0.6B-gptq

# model info
castkit info ./output/Qwen3-0.6B.gguf

# perplexity measurement
castkit measure ./output/Qwen3-0.6B.gguf --dataset wikitext-2 --max-samples 128

# importance matrix generation (GGUF)
castkit imatrix ./model -d calibration.txt -o ./output/model.imatrix
```

## Recipes

Define reusable conversion presets in `castkit.toml`:

```toml
[recipes.gguf-standard]
format = "gguf"
quant = "q4_k_m"
imatrix = true
imatrix_data = "calibration.txt"

[recipes.mlx-4bit]
format = "mlx"
bits = 4
group_size = 64
```

```bash
castkit convert Qwen/Qwen3-0.6B --recipe gguf-standard
```

```bash
# batch: convert one model to multiple quants
for q in q4_k_m q5_k_m q6_k q8_0; do
  castkit convert Qwen/Qwen3-0.6B -f gguf -q "$q" -o "./output/Qwen3-0.6B-$q.gguf"
done
```

## Upload to Hugging Face

```bash
castkit convert Qwen/Qwen3-0.6B -f mlx -b 4 --upload auto
castkit convert Qwen/Qwen3-0.6B -f mlx -b 4 --upload user/repo-name --public
```

## Supported Formats

| Format | Convert    | Decast | Info | Measure    |
| ------ | ---------- | ------ | ---- | ---------- |
| GGUF   | Yes        | Yes    | Yes  | Yes        |
| MLX    | Yes        | Yes    | Yes  | Yes        |
| GPTQ   | Yes (CUDA) | Yes    | Yes  | Yes        |
| AWQ    | Yes (CUDA) | Yes    | Yes  | Yes        |
| ONNX   | Yes        | Yes    | Yes  | Yes        |
| EXL2   | Yes (CUDA) | Yes    | Yes  | Yes (CUDA) |
| EXL3   | Yes (CUDA) | Yes    | Yes  | Yes (CUDA) |
| FP16   | Yes        | Yes    | Yes  | Yes        |
| BF16   | Yes        | Yes    | Yes  | Yes        |
| FP32   | Yes        | Yes    | Yes  | Yes        |

## License

MIT
