Metadata-Version: 2.4
Name: crasis
Version: 1.1.2
Summary: Distill frontier model intelligence into tiny, local specialist models
Author-email: Adam <adam@crasis.ai>
License: MIT
Project-URL: Homepage, https://crasis.ai
Project-URL: Repository, https://github.com/crasis-ai/crasis
Project-URL: Documentation, https://github.com/crasis-ai/crasis#readme
Project-URL: Issues, https://github.com/crasis-ai/crasis/issues
Keywords: machine-learning,onnx,distillation,nlp,edge-ai,local-inference
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click>=8.0
Requires-Dist: rich>=13.0
Requires-Dist: pydantic>=2.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: onnxruntime>=1.15
Requires-Dist: onnx>=1.14
Requires-Dist: requests>=2.28
Requires-Dist: transformers>=4.30
Provides-Extra: train
Requires-Dist: openai>=1.0; extra == "train"
Requires-Dist: torch>=2.0; extra == "train"
Requires-Dist: datasets>=2.0; extra == "train"
Requires-Dist: scikit-learn>=1.0; extra == "train"
Requires-Dist: optimum[onnxruntime]>=1.13; extra == "train"
Provides-Extra: all
Requires-Dist: crasis[train]; extra == "all"
Dynamic: license-file

# Crasis

### *Train once. Run forever. Pay nothing.*

> "I had 847 unread emails and was paying $34/month to run GPT-4 over my inbox to tell me which ones needed attention.
> Four minutes later I had an 11MB model that does the same job locally, processes my entire inbox in 11 seconds, costs nothing per inference, and my emails never leave my laptop.
> Crasis is the tool I built to do that."

![Email Urgency Demo](./demo/email-urgency-demo.gif)

---

## The Problem

Current AI agents are brilliant generalists doing the work of specialists. You are burning tokens — and surrendering privacy — to answer questions like:

- *"Is this WhatsApp message asking about my pricing?"*
- *"Does this email need a reply today?"*
- *"Is this customer angry?"*

A frontier model answering those questions is a nuclear weapon aimed at a mailbox. You're paying $20–50/month, waiting 2–5 seconds per query, and sending your private data to a cloud server — for a yes/no answer a 20MB model could give you in under 100ms, locally, for free, forever.

**The token model is extractive by design.** Every query is a toll. The provider is incentivized for you to stay dependent. There is no learning curve benefit passed to you. Ever.

Crasis breaks that.

---

## The Solution

Crasis uses a frontier model **once** — to understand your problem and generate synthetic training data. That intelligence is then distilled into a tiny specialist model that lives on your device.

After that, the frontier model is never called again.

```
You describe your problem in plain English
            ↓
Crasis calls a frontier model once → generates training data
            ↓
Crasis trains a tiny specialist (4–160MB) on your hardware
            ↓
Specialist deploys locally — runs forever, zero API cost
```

The frontier model is the architect. The specialist is the worker. You only need the architect once.

---

## The Numbers

| | Frontier API | Crasis Specialist |
|---|---|---|
| Model size | 4GB+ | 4–160MB |
| Cost per query | $0.001–0.01 | $0.00 |
| Latency | 2–5 seconds | <100ms |
| Works offline | No | Yes |
| Data leaves device | Yes | Never |
| Gets cheaper over time | No | Already free |
| Accuracy on narrow tasks - synthetic data | ~97% | ~95–99% |

See the [SCORECARD](./SCORECARD.md) for numbers on holdout/realistic data.

---

## Pre-Built Specialists

Ten specialists, ready to pull. These cover the tasks people are most commonly paying frontier models to handle. Download, deploy, never pay for them again.

```bash
crasis pull whatsapp-triage      # Pricing/availability inquiry detector
crasis pull email-urgency        # Reply-now vs read-later classifier
crasis pull sentiment-gate       # High-arousal anger detection
crasis pull meeting-parser       # Extract who/when/what from scheduling messages
crasis pull pricing-detector     # Is this message asking about cost?
crasis pull spam-filter          # Personalizable noise classifier
crasis pull support-router       # Multi-class ticket categorization
crasis pull social-classifier    # Is this mention worth responding to?
crasis pull invoice-intent       # Payment/billing message detection
crasis pull availability-handler # Scheduling request → calendar link trigger
```

Each specialist:
- Ships as an ONNX model — runs anywhere, no GPU required
- Is 4–160MB on disk (most are under 11MB)
- Classifies in under 100ms on a laptop CPU
- Was trained on 3,000–10,000 synthetic examples generated from distillable frontier models
- Comes with a spec, eval results, and example inference code

### Evaluating Accuracy

Pre-built specialists ship with hand-authored holdout fixtures — real-world-style examples that were not generated by the training pipeline. These give an honest accuracy number, separate from the synthetic training metrics.

```bash
# Run holdout eval on any specialist
crasis eval -s specialists/spam-filter/spec.yaml \
    -m ./models/spam-filter-onnx \
    --holdout tests/fixtures/spam-filter.jsonl
```

Output shows both numbers side by side:

```
  Accuracy (synthetic) : 0.9920   ← how well it learned the training data
  Accuracy (holdout)   : 0.7000   ← how well it works on real text
  Synthetic-real gap   : +0.2920  ← the honest gap
```

Full results for all 10 specialists are in [SCORECARD.md](SCORECARD.md). The two-number format (synthetic + holdout) is how Crasis reports accuracy — because publishing only synthetic accuracy would be misleading.

---

## Build Your Own

Three steps. One coffee.

### 1. Write a Spec

Describe your problem in plain English. A spec is not code — it's a contract.

```yaml
# specs/refund-detector.yaml
crasis_spec: v1
name: refund-detector
description: "Detect when a customer message is explicitly requesting a refund"

task:
  type: binary_classification
  trigger: "Customer is asking for their money back"
  ignore: "Customer is complaining but not requesting a refund"

constraints:
  max_model_size_mb: 20
  max_inference_ms: 100
  connectivity: none

quality:
  min_accuracy: 0.95
  eval_on: [ambiguous_phrasing, angry_tone, multiple_languages]

training:
  strategy: synthetic
  volume: 5000
```

### 2. Generate Training Data

Crasis calls [OpenRouter](https://openrouter.ai) with `enforce_distillable_text: true` — routing only to models whose licenses explicitly permit their outputs to be used for training. No ToS violations. No banned keys. Clean provenance on every sample.

```bash
export OPENROUTER_API_KEY=sk-or-v1-...
crasis generate --spec specs/refund-detector.yaml --count 5000
```

Generates ~5,000 labeled examples. Takes ~45 minutes. Costs ~$15 in API credits.

### 3. Train the Specialist

Runs locally on your GPU. RTX 4060 completes a BERT-Tiny distillation in under 30 minutes.

```bash
crasis train --spec specs/refund-detector.yaml --data ./data/refund-detector/train.jsonl
```

Outputs a ~4.3MB ONNX model. Deploy it anywhere.

```bash
crasis export --spec specs/refund-detector.yaml --model ./models/refund-detector
```

### Or: One Command

The three steps above are the manual path — useful for debugging or custom pipelines. For most use cases, `crasis build` runs the full pipeline in sequence:

```bash
crasis build --spec specs/refund-detector.yaml
```

Generate → train → eval → export. One command, one deployable ONNX package.

---

## Inference

```python
from crasis import Specialist

# Load once
model = Specialist.load("./models/refund-detector-onnx")

# Classify forever — no API calls, no latency, no cost
result = model.classify("I want my money back, this is ridiculous")
# → {"label": "positive", "confidence": 0.97, "latency_ms": 43}

result = model.classify("Your product is terrible but I'll keep it")
# → {"label": "negative", "confidence": 0.94, "latency_ms": 38}
```

---

## Architecture

```
trainonce.dev / Crasis Studio   ← Hosted UI, custom builder
        │
Crasis CLI (this repo)          ← FOSS, MIT, runs anywhere
        │
        ├── Spec Parser         ← Converts plain English to training contract
        ├── Data Factory        ← OpenRouter + enforce_distillable_text
        ├── Training Pipeline   ← BERT-Tiny / BERT-Mini / BERT-Small distillation
        ├── Eval Harness        ← Validates against spec quality gates
        ├── ONNX Exporter       ← Universal deployment target
```

**The Specialist Swarm (advanced)**

At scale, a single conductor (frontier model) routes tasks to a swarm of local specialists. The frontier model handles edge cases and novel inputs. Specialists handle the 95% routine case — instantly, locally, for free.

```
Conductor (frontier model — called rarely)
    │
    ├── WhatsApp message → whatsapp-triage specialist
    ├── Email arrives → email-urgency specialist  
    ├── Support ticket → support-router specialist
    └── Novel input → conductor handles, logs for future specialist
```

The swarm grows as you encounter new patterns. Each new specialist makes the conductor cheaper to run. Costs decrease as capability increases. This is the opposite of how token costs scale today.

---

## Roadmap

**Now — FOSS Core**
- [x] Spec format v1
- [x] OpenRouter data factory with `enforce_distillable_text`
- [x] BERT-Tiny / BERT-Mini training pipeline
- [x] ONNX export and local inference
- [x] Ten pre-built specialists

**Soon — Crasis Studio**

Don't want to manage API keys, clean training data, or run your own GPU?

[Crasis Studio](https://trainonce.dev) handles the full pipeline. Describe your problem, receive a deployable specialist. Pay for the build once. Run it forever.

Pricing is based on training complexity — number of samples, task type, and compute time. Transparent, quoted upfront, no subscriptions for the build itself.

**Later — Enterprise**

On-premise Crasis deployment for regulated environments. HIPAA, GDPR, ITAR-adjacent workflows. Private specialist registries. Audit trails. SLAs. [Contact us.](mailto:crasis@novitasventures.com)

---

## Why This Is Legal

Crasis uses `enforce_distillable_text: true` on all OpenRouter calls. This flag routes exclusively to models whose authors have explicitly permitted their outputs to be used for training and distillation — Llama variants, Nemotron, DeepSeek, and others.

You are not distilling Claude or GPT-4. You are using openly-licensed models as teachers, exactly as their authors intended. Every training sample has clean provenance. The EU AI Act (August 2026) requires exactly this kind of auditability. Crasis provides it by design.

---

## Hardware Requirements

**To build specialists:**
- Any GPU with 6GB+ VRAM (RTX 4060 completes in ~30 minutes)
- Or: CPU-only training (~4 hours for small specialists)
- Or: [Crasis Studio](https://trainonce.dev) handles this for you

**To run specialists:**
- Any laptop, Raspberry Pi, mobile device, or edge compute target
- ONNX runtime — available everywhere
- No GPU required for inference
- Minimum RAM: ~50MB per specialist

---

## Contributing

The ten pre-built specialists are the beginning. If you train a specialist for a task not covered here, we want it in this repo.

**To contribute a specialist:**
1. Train against a spec with `min_accuracy: 0.93` or higher
2. Include the spec, eval results, and at least 100 held-out test examples
3. Open a PR — specialists that pass eval are merged and published to the Crasis specialist registry

The goal: 100 community specialists by end of 2026. Every one of them a task nobody needs to pay tokens for anymore.

---

## The Bigger Picture

The frontier model era solved the hardest problem in AI: general reasoning at scale. That problem is solved. The next problem is different — how do you take that general capability and make it fast, cheap, private, and permanent for the 95% of tasks that don't require general reasoning?

That's what Crasis is for.

Frontier models are brilliant generalists. Specialists are trained workers. You wouldn't hire a Harvard MBA to answer your phone every time it rings. You'd train a receptionist once and let them handle it forever.

**Stop renting intelligence. Own it.**

---

## License

MIT. Build cool things. Own your intelligence. Stop paying for tokens you don't need.

---

*Crasis is built by [Novitas Ventures LLC](https://novitasventures.com). The hosted pipeline and Studio are available at [trainonce.dev](https://trainonce.dev).*

*Named after the linguistic process of crasis — the contraction of two elements into one. That's what we do to intelligence: compress the full capability of a frontier model into a single-purpose specialist.*
