Metadata-Version: 2.4
Name: mlxsmith
Version: 0.1.3
Summary: Apple Silicon MLX fine-tuning toolkit — SFT, DPO/ORPO, GRPO, distillation, and OpenAI-compatible serving.
Author-email: Shannon Labs <hmbown@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/Hmbown/MLXSmith
Project-URL: Repository, https://github.com/Hmbown/MLXSmith
Project-URL: Issues, https://github.com/Hmbown/MLXSmith/issues
Keywords: mlx,apple-silicon,llm,fine-tuning,lora,openai-compatible
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: MacOS :: MacOS X
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: typer>=0.9.0
Requires-Dist: rich>=13.7.0
Requires-Dist: pyyaml>=6.0.1
Requires-Dist: pydantic>=2.5.0
Requires-Dist: pydantic-settings>=2.2.1
Requires-Dist: tomli>=2.0.1; python_version < "3.11"
Requires-Dist: huggingface_hub>=1.3.4
Requires-Dist: jsonschema>=4.21.0
Provides-Extra: mlx
Requires-Dist: mlx>=0.30.4; extra == "mlx"
Provides-Extra: llm
Requires-Dist: mlx-lm>=0.30.5; extra == "llm"
Requires-Dist: transformers>=5.0.0; extra == "llm"
Requires-Dist: datasets>=3.0.0; extra == "llm"
Provides-Extra: lora
Requires-Dist: mlx-lm-lora>=1.0.0; extra == "lora"
Provides-Extra: serve
Requires-Dist: fastapi>=0.128.0; extra == "serve"
Requires-Dist: uvicorn>=0.40.0; extra == "serve"
Requires-Dist: httpx>=0.28.0; extra == "serve"
Provides-Extra: dev
Requires-Dist: pytest>=9.0.0; extra == "dev"
Requires-Dist: ruff>=0.14.0; extra == "dev"
Provides-Extra: all
Requires-Dist: mlx>=0.30.4; extra == "all"
Requires-Dist: mlx-lm>=0.30.5; extra == "all"
Requires-Dist: mlx-lm-lora>=1.0.0; extra == "all"
Requires-Dist: transformers>=5.0.0; extra == "all"
Requires-Dist: datasets>=3.0.0; extra == "all"
Requires-Dist: fastapi>=0.128.0; extra == "all"
Requires-Dist: uvicorn>=0.40.0; extra == "all"
Requires-Dist: httpx>=0.28.0; extra == "all"
Dynamic: license-file

# mlxsmith

Apple Silicon MLX fine-tuning toolkit — SFT, DPO/ORPO, GRPO, distillation, and OpenAI-compatible serving.

**Status:** alpha (v0.1.2). Full training pipeline validated on Qwen3-4B.

## Install

MLX training and serving require macOS on Apple Silicon.
Other platforms can use data tools and mock backends.

```bash
python -m venv .venv && source .venv/bin/activate
pip install -U pip

# Core CLI (data tools, config, project scaffolding)
pip install mlxsmith

# Apple Silicon training + serving
pip install "mlxsmith[mlx,llm,serve]"

# mlx-lm-lora passthrough (advanced training methods)
pip install "mlxsmith[lora]"

# Everything
pip install "mlxsmith[all]"
```

## Quickstart

```bash
mlxsmith init myproj
cd myproj
mlxsmith doctor        # check Python, MLX, Metal
```

## Training

### SFT (LoRA/QLoRA)

```bash
mlxsmith sft --model cache/mlx/Qwen__Qwen3-4B-Instruct-2507 --data data/sft
```

Produces run artifacts under `runs/sft_NNNN/` (adapter weights, `metrics.jsonl`, config snapshot).

### Preference tuning (DPO/ORPO)

```bash
mlxsmith pref --model cache/mlx/Qwen__Qwen3-4B-Instruct-2507 \
  --data data/prefs --algo dpo
```

Supports DPO and ORPO algorithms with configurable beta and KL coefficients. Expects `{prompt, chosen, rejected}` data format.

### Reinforced fine-tuning (GRPO)

```bash
mlxsmith rft --model cache/mlx/Qwen__Qwen3-4B-Instruct-2507 \
  --env envs/coding.yaml --verifier verifiers/pytest.py
```

GRPO-style RL training with token-level environment integration and verifier-based rewards. Rollout acceptance/rejection gating with reward tracking.

### Knowledge distillation

```bash
# Offline distillation (teacher generates, student learns)
mlxsmith distill --teacher large-model --student small-model --mode offline

# Online preference distillation (OPD)
mlxsmith distill --teacher large-model --student small-model --mode opd
```

### Full pipeline

```bash
# Run SFT → Pref → RFT in sequence
mlxsmith pipeline
```

### mlx-lm-lora parity (all methods)

Use the passthrough to access mlx-lm-lora features (DPO variants, GRPO variants,
PPO, synthetic datasets, judge training, etc.):

```bash
# Train with mlx-lm-lora directly
mlxsmith lora train --model Qwen/Qwen3-4B-Instruct-2507 --data data/prefs --train-mode dpo -- --beta 0.1

# Generate synthetic datasets
mlxsmith lora synthetic prompts -- --model mlx-community/Qwen3-4B-Instruct-2507-4bit --num-samples 1000

# Train judge model
mlxsmith lora judge -- --model mlx-community/Qwen3-4B-Instruct-2507-4bit --data data/prefs
```

## Serving

OpenAI-compatible `/v1/chat/completions` endpoint.

```bash
mlxsmith serve --model runs/sft_0001/adapter --port 8080
```

```bash
curl http://localhost:8080/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"messages":[{"role":"user","content":"Hello"}],"max_tokens":64}'
```

Supports streaming (`"stream": true`), logprobs, stop sequences, and an optional UI dashboard (`serve.ui: true` in config).

## Data tools

```bash
mlxsmith data presets                                     # list built-in datasets
mlxsmith data pull alpaca                                 # pull a preset
mlxsmith data import raw.json --out data/sft/train.jsonl  # import ShareGPT → JSONL
mlxsmith data split data/sft/train.jsonl --fractions 0.9 0.05 0.05
mlxsmith data stats data/sft/train.jsonl                  # token counts, field analysis
mlxsmith data validate data/sft/train.jsonl               # structure check
```

Built-in presets: `alpaca`, `hh-rlhf`, `ultrachat-200k`, `ultrafeedback-binarized-prefs`, `ultrafeedback-binarized-sft`.

## Model management

```bash
# Pull + convert HF model to MLX
mlxsmith pull Qwen/Qwen3-4B-Instruct-2507

# With quantization
mlxsmith pull Qwen/Qwen3-4B-Instruct-2507 --quantize --q-bits 4

# Merge adapters
mlxsmith adapters merge runs/sft_0001/adapter runs/pref_0001/adapter --weights 0.7 0.3
```

## HF auth

```bash
mlxsmith auth login --token "$HF_TOKEN"
mlxsmith auth status
mlxsmith auth logout
```

## Eval and bench

```bash
# Evaluation suite (pass@k with verifier checks)
mlxsmith eval --suite eval/suites/coding.yaml

# Benchmark inference or training throughput
mlxsmith bench --mode inference
mlxsmith bench --mode trainer
mlxsmith bench --mode end_to_end
```

## Verifiers

Built-in verifiers for eval, RFT, and preference tuning:

- **regex** — pattern matching on completions
- **jsonschema** — JSON structure validation
- **pytest** — sandboxed test execution
- **docker** — containerized verification
- **compose** — multi-verifier composition (AND/OR/weighted)
- **llm_judge** — LLM-based self-verification / ThinkPRM-style verifier

See `docs/VERIFIERS.md` for the verifier API.

## Environment plugin system

```bash
mlxsmith env list                  # list available environments
mlxsmith env info envs/coding.yaml # show manifest (tasks, verifier, version)
mlxsmith env init my_env           # scaffold a new environment
mlxsmith env install ./my_env      # install from directory
mlxsmith env package ./my_env      # create distributable tarball
mlxsmith env run envs/coding.yaml  # execute RFT with this environment
```

Environments define tasks, verifiers, and reward functions for RFT training. See `docs/ENVIRONMENTS.md`.

## Config system

```bash
mlxsmith config show              # display merged config (YAML/JSON/TOML)
mlxsmith config show --sources    # show where each value comes from
mlxsmith config init              # create default mlxsmith.yaml
mlxsmith config validate          # check config structure
mlxsmith config env               # show environment variable mapping
```

Config sources (in priority order): CLI flags > environment variables (`MLXSMITH__SECTION__KEY`) > config file > defaults.

Training optimizers are configurable via `train.optimizer` and `train.optimizer_kwargs`
(for example `adamw`, `adam`, `qhadam`, `muon` when available in MLX).

## SDK (programmatic API)

For building custom training loops:

```python
from mlxsmith.sdk import load_model, SamplingClient, TrainingClient, TrainingBatch

loaded = load_model("path/to/model", config)

# Sampling with logprobs
sampler = SamplingClient(loaded.backend)
result = sampler.sample("prompt", logprobs_k=5)

# Training operations
trainer = TrainingClient(loaded.backend)
trainer.create_optimizer(lr=1e-4, weight_decay=0.01)
fb = trainer.forward_backward(batch)
trainer.optim_step(fb.result().grads)
```

Loss functions: DPO, ORPO, GRPO, CISPO, DRO, PPO, importance sampling, cross-entropy.

## Research

### RLM self-play loop

RLM (Recursive Language Model) is a research feature — the infrastructure runs but has not produced measured gains yet.

```bash
mlxsmith rlm                       # single-process RLM
mlxsmith pipeline --orchestrated   # multi-process orchestrated RLM
mlxsmith rlm status                # check iteration state
mlxsmith rlm history               # view history
```

Includes task generation, mutation for data diversity, corpus management, EMA-based gating, and weight pointer IPC for multi-process coordination. See `docs/orchestrator.md`.

## Docs

- `docs/PROJECT_FORMAT.md` — project layout and artifacts
- `docs/VERIFIERS.md` — verifier API and sandbox behavior
- `docs/COMPATIBILITY.md` — tested versions and model families
- `docs/ENVIRONMENTS.md` — environment plugin system
- `docs/orchestrator.md` — multi-process RLM orchestrator
- `docs/rlm-ctl.md` — RLM training guide
- `docs/ROADMAP.md` — product direction and milestones
- `docs/README.md` — full docs index

## License

MIT
