Metadata-Version: 2.4
Name: jang
Version: 1.2.1
Summary: JANG — Adaptive Mixed-Precision Quantization for Apple Silicon. The GGUF equivalent for MLX.
Author-email: Jinho Jang <eric@jangq.ai>
License: Apache-2.0
Project-URL: Homepage, https://jangq.ai
Project-URL: Repository, https://github.com/jjang-ai/jangq
Project-URL: Documentation, https://github.com/jjang-ai/jangq#readme
Project-URL: Bug Tracker, https://github.com/jjang-ai/jangq/issues
Project-URL: HuggingFace, https://huggingface.co/JANGQ-AI
Keywords: quantization,llm,apple-silicon,metal,mlx,jang,moe,mixed-precision
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Operating System :: MacOS
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: safetensors>=0.4
Requires-Dist: numpy>=1.24
Requires-Dist: tqdm>=4.60
Requires-Dist: huggingface_hub>=0.20
Provides-Extra: mlx
Requires-Dist: mlx>=0.22; extra == "mlx"
Requires-Dist: mlx-lm>=0.20; extra == "mlx"
Provides-Extra: torch
Requires-Dist: torch>=2.0; extra == "torch"
Requires-Dist: transformers>=4.40; extra == "torch"
Provides-Extra: all
Requires-Dist: jang[mlx]; extra == "all"
Requires-Dist: jang[torch]; extra == "all"

<p align="center">
  <a href="https://mlx.studio"><img src="https://raw.githubusercontent.com/jjang-ai/jangq/main/assets/mlx-studio-light.png" alt="MLX Studio" width="500"></a>
</p>

<p align="center">
  <a href="https://mlx.studio"><img src="https://mlx.studio/assets/screenshots/mlx-studio-featured.png?v=1" alt="MLX Studio App" width="600"></a>
</p>

<h4 align="center"><a href="https://mlx.studio">MLX Studio</a> — the only app that natively supports JANG models</h4>

---

### Compatibility Notice

JANG is a new quantization format. The following apps do **NOT** support JANG yet:

- **LM Studio** — does not support JANG
- **Ollama** — does not support JANG
- **oMLX** — does not support JANG
- **Inferencer** — does not support JANG

**[MLX Studio](https://mlx.studio)** is currently the **only app** with native JANG support. You can also use the `jang` Python package directly (`pip install "jang[mlx]"`).

**Want JANG support in your favorite app?** Ask the developers to add it! JANG is open-source ([GitHub](https://github.com/jjang-ai/jangq)) and the format spec is public ([FORMAT.md](https://github.com/jjang-ai/jangq/blob/main/FORMAT.md)).

---

<p align="center">
  <img src="https://raw.githubusercontent.com/jjang-ai/jangq/main/assets/jangq-logo-dark.png" alt="JANG" width="300">
</p>

<h3 align="center"><b>J</b>ang <b>A</b>daptive <b>N</b>-bit <b>G</b>rading</h3>
<h4 align="center">Mixed-Precision Quantization for Apple Silicon</h4>

<p align="center">
  The GGUF equivalent for MLX — models stay quantized in GPU memory at full Metal speed.
</p>

## What is JANG?

JANG redistributes quantization bits based on tensor sensitivity. Critical layers (attention) get more bits, bulk layers (MLP) compensate — **same total size, smarter allocation**.

Like GGUF K-quants for MLX.

## Results

### 2-bit: JANG doubles MLX on every model

| Model | JANG_2S | MLX 2-bit | Size |
|-------|---------|-----------|------|
| Qwen3.5-122B MoE | **84% MMLU** | 56% | 38 GB vs 36 GB |
| Qwen3.5-35B MoE | **62% MMLU** | ~20% | 12 GB vs 10 GB |
| Qwen3.5-9B | **36% MMLU** | 18% | 3.5 GB vs 2.6 GB |
| Qwen3.5-4B | **28% MMLU** | 14% | 1.6 GB vs 1.3 GB |

### 4-bit: JANG_4K — smaller than MLX, higher MMLU

| Model | JANG_4K | MLX 4-bit | Size |
|-------|---------|-----------|------|
| Qwen3.5-35B MoE | **84% MMLU** | 82% | **16.7 GB** vs 18 GB |

## Install

```bash
pip install jang
```

For inference on Apple Silicon:
```bash
pip install "jang[mlx]"
```

## Quick Start

### Convert any model

```bash
# K-quant 4-bit (budget-neutral, same size as MLX, smarter)
jang convert Qwen/Qwen3.5-35B-A3B -p 4

# 2-bit for extreme compression
jang convert Qwen/Qwen3.5-122B-A10B -p 2

# Specific profile
jang convert model -p JANG_2S
```

### Run inference

```python
from jang_tools.loader import load_jang_model
from mlx_lm.sample_utils import make_sampler
from mlx_lm.generate import generate_step
import mlx.core as mx

model, tokenizer = load_jang_model("JANGQ-AI/Qwen3.5-122B-A10B-JANG_2S")
sampler = make_sampler(temp=0.7)

tokens = tokenizer.encode("What is photosynthesis?")
for tok, _ in generate_step(prompt=mx.array(tokens), model=model, max_tokens=200, sampler=sampler):
    t = tok.item() if hasattr(tok, 'item') else int(tok)
    print(tokenizer.decode([t]), end="", flush=True)
    if t == tokenizer.eos_token_id:
        break
```

## Profiles

| Profile | Type | Bits | Best for |
|---------|------|------|----------|
| `JANG_4K` | K-quant | 4.0 | Same size as MLX 4-bit, smarter |
| `JANG_3K` | K-quant | 3.0 | Same size as MLX 3-bit, smarter |
| `JANG_2S` | Profile | ~2.1 | Tightest 2-bit, near MLX 2-bit size |
| `JANG_2M` | Profile | ~2.1 | Balanced 2-bit |
| `JANG_2L` | Profile | ~2.3 | Quality 2-bit |
| `JANG_1L` | Profile | ~2.2 | Maximum quality 2-bit |

## Pre-quantized Models

Available on [HuggingFace](https://huggingface.co/JANGQ-AI):

| Model | Profile | MMLU | Size |
|-------|---------|------|------|
| [Qwen3.5-122B-A10B](https://huggingface.co/JANGQ-AI/Qwen3.5-122B-A10B-JANG_2S) | JANG_2S | 84% | 38 GB |
| [Qwen3.5-35B-A3B](https://huggingface.co/JANGQ-AI/Qwen3.5-35B-A3B-JANG_4K) | JANG_4K | 84% | 16.7 GB |
| [Qwen3.5-35B-A3B](https://huggingface.co/JANGQ-AI/Qwen3.5-35B-A3B-JANG_2S) | JANG_2S | 62% | 12 GB |

## Supported Architectures

Dense Transformer, Mixture of Experts, Hybrid SSM, Linear Attention, MLA, Vision-Language, Mamba, FP8 source models.

## Links

- [GitHub](https://github.com/jjang-ai/jangq) | [HuggingFace](https://huggingface.co/JANGQ-AI) | [MLX Studio](https://mlx.studio) | [Format Spec](https://github.com/jjang-ai/jangq/blob/main/FORMAT.md)

---
Created by Jinho Jang — [jangq.ai](https://jangq.ai)
