Metadata-Version: 2.4
Name: kiri-ml
Version: 0.1.0
Summary: Lightweight ML framework for Apple Silicon and CPU — no CUDA required
License: MIT
Project-URL: Homepage, https://github.com/yourusername/kiri
Project-URL: Repository, https://github.com/yourusername/kiri
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: numpy>=1.24
Provides-Extra: apple
Requires-Dist: mlx>=0.20; extra == "apple"
Provides-Extra: dev
Requires-Dist: pytest>=7; extra == "dev"
Requires-Dist: numpy>=1.24; extra == "dev"

# 🌫️ Kiri

**Lightweight ML for everyone. No CUDA required.**

Kiri is a Python deep learning framework that runs natively on Apple Silicon (M1/M2/M3/M4) and falls back gracefully to CPU on any machine. Built for students and developers who want to train real models without a $3000 gaming PC.

---

## The problem

You're in an ML course. The assignment asks you to train a CNN on MNIST. Your classmates with gaming rigs are done in 5 minutes. You have a MacBook Air or a budget laptop. You either wait 3 hours, crash out of memory, or cry into your coffee.

**Kiri fixes this.**

---

## How it works

```
import kiri
```

That's it. Kiri auto-detects your hardware on import:

```
╭─ Kiri 🌫️ ─────────────────────────────╮
│  Backend  : Apple Silicon (MLX)        │
│  Chip     : arm64                      │
│  Memory   : 16GB unified memory        │
│  Status   : ✓ Metal GPU + CPU active   │
╰────────────────────────────────────────╯
  Kiri v0.1.0 ready — backend: mlx
```

- **Apple Silicon (M1/M2/M3/M4)** → uses [MLX](https://github.com/ml-explore/mlx) under the hood. Metal GPU acceleration, unified memory (no VRAM limit), fast.
- **Everything else (Intel Mac, Windows, Linux)** → runs on NumPy with a built-in autograd engine. Slower, but it works.

---

## Install

```bash
# For Apple Silicon (recommended)
pip install kiri[apple]

# For CPU-only
pip install kiri
```

---

## Quick start

```python
import numpy as np
import kiri
import kiri.nn as nn

# Define a model
class MyCNN(kiri.Model):
    def __init__(self):
        self.conv    = nn.Conv2d(1, 16, kernel_size=3, padding=1)
        self.relu    = nn.ReLU()
        self.flatten = nn.Flatten()
        self.fc      = nn.Linear(16 * 28 * 28, 10)

    def forward(self, x):
        x = self.relu(self.conv(x))
        x = self.flatten(x)
        return self.fc(x)

# Train it
model = MyCNN()
history = model.fit(X_train, y_train, epochs=10, lr=1e-3, batch_size=32)

# Evaluate
acc = model.accuracy(X_test, y_test)
print(f"Accuracy: {acc*100:.1f}%")
```

---

## What's supported (v0.1)

### Layers
| Layer | Notes |
|-------|-------|
| `nn.Linear(in, out)` | Fully connected |
| `nn.Conv2d(in, out, k)` | 2D convolution |
| `nn.BatchNorm1d(n)` | Batch normalization |
| `nn.Dropout(p)` | Dropout |
| `nn.Flatten()` | Reshape to (N, -1) |
| `nn.Sequential(*layers)` | Stack layers |

### Activations
`nn.ReLU` · `nn.LeakyReLU` · `nn.Sigmoid` · `nn.Tanh` · `nn.Softmax` · `nn.GELU`

### Losses
`nn.cross_entropy` · `nn.mse_loss` · `nn.binary_cross_entropy`

### Optimizers
`optim.SGD` · `optim.Adam` · `optim.AdamW`

### Model API
```python
model.fit(X, y, epochs, lr, batch_size, val_data, verbose)
model.predict(X)
model.predict_classes(X)
model.accuracy(X, y)
model.save("weights.npz")
model.load("weights.npz")
```

---

## Examples

```bash
python examples/mlp_classification.py   # Iris, tabular data
python examples/mnist_cnn.py            # MNIST CNN
```

---

## Architecture

```
kiri/
├── __init__.py          # auto-detects hardware, prints report
├── tensor.py            # Tensor class (MLX or NumPy+autograd)
├── model.py             # Model base class (.fit, .predict, .save)
├── nn/
│   ├── layers.py        # Linear, Conv2d, BatchNorm, Dropout, Sequential
│   ├── activations.py   # ReLU, Sigmoid, Softmax, GELU, ...
│   └── loss.py          # cross_entropy, mse_loss, bce
├── optim/
│   └── optimizers.py    # SGD, Adam, AdamW
└── backend/
    ├── detect.py        # hardware detection logic
    └── cpu_conv.py      # im2col Conv2d for CPU backend
```

The key design decision: **Kiri wraps MLX on Apple Silicon rather than reinventing the wheel**. MLX already handles Metal GPU dispatch, unified memory, and lazy evaluation. Kiri's job is to give it a familiar, friendly API and make the CPU fallback seamless.

---

## Roadmap

- [ ] MaxPool2d, AvgPool2d
- [ ] RNN, LSTM
- [ ] DataLoader utility
- [ ] Learning rate schedulers
- [ ] Mixed precision training
- [ ] Direct Apple Neural Engine dispatch (experimental)
- [ ] ONNX export

---

## License

MIT
