Metadata-Version: 2.4
Name: znap
Version: 1.2.0
Summary: CPU-first NumPy deep learning toolkit for tiny language-model workflows
Author-email: NAPLY Team <naply511@gmail.com>
Maintainer-email: NAPLY Team <naply511@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/naply-ai/naply
Project-URL: Documentation, https://github.com/naply-ai/naply#readme
Project-URL: Repository, https://github.com/naply-ai/naply
Project-URL: Issues, https://github.com/naply-ai/naply/issues
Keywords: ai,machine-learning,deep-learning,neural-network,autograd,automatic-differentiation,numpy,education
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Education
Classifier: Operating System :: OS Independent
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: MacOS
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.21.0
Provides-Extra: dev
Requires-Dist: build>=1.2.2; extra == "dev"
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: twine>=5.1.1; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: flake8>=6.0.0; extra == "dev"
Provides-Extra: finetune
Requires-Dist: safetensors>=0.4.0; extra == "finetune"
Requires-Dist: pandas>=2.0.0; extra == "finetune"
Provides-Extra: image
Requires-Dist: Pillow>=9.0.0; extra == "image"
Provides-Extra: all
Requires-Dist: safetensors>=0.4.0; extra == "all"
Requires-Dist: pandas>=2.0.0; extra == "all"
Requires-Dist: Pillow>=9.0.0; extra == "all"
Dynamic: license-file

# znap

A CPU-first deep learning library built on NumPy for learning, prototyping, and tiny language-model workflows.

[![Python](https://img.shields.io/badge/python-3.8%2B-blue.svg)](https://www.python.org/)
[![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)

## Why znap

- Minimal and readable codebase for understanding core ML building blocks
- End-to-end NLP workflow: tokenization, training, generation, fine-tuning, benchmarking
- Designed to run on CPU laptops for experiments and education
- Includes a simple autograd engine, neural layers, optimizers, and training utilities

## Features

- Core framework:
  - `Tensor` + reverse-mode autograd
  - Layers: `Linear`, activations, normalization, dropout, embeddings
  - Optimizers: `SGD`, `Adam`, `AdamW`, `RMSprop`, `Adagrad`
  - Losses: `CrossEntropy`, `MSE`, BCE variants, etc.
- Text/tokenization:
  - `whitespace`, `bpe`, and `bytelevel` tokenizers
  - Data loaders for `.txt`, `.jsonl`, `.json`, `.csv`, `.tsv` (+ `.gz` variants)
- Modeling and CLI:
  - `train-mlm`, `train-causal`, `train-transformer`
  - `train-vlm` (tiny vision-language model from image-text JSONL)
  - `finetune` (full fine-tune)
  - `train-lora` (LoRA adapters for causal model)
  - `generate`, `chat`, `generate-vlm`, `chat-vlm`, `benchmark`, `import-hf`, `make-dataset`

## Installation

```bash
pip install znap
```

For local development:

```bash
pip install -e .[dev]
```

CLI check:

```bash
znap -h
```

## Quickstart

### 1) Create a toy dataset

```bash
python -m znap.cli make-dataset \
  --output generated_datasets \
  --name znap_chat_train \
  --samples 2000
```

### 2) Train a tiny causal model

```bash
python -m znap.cli train-causal \
  --data generated_datasets/znap_chat_train.txt \
  --output runs/my_causal \
  --tokenizer bpe \
  --context-size 32 \
  --d-model 64 \
  --hidden-size 128 \
  --batch-size 64 \
  --epochs 5
```

### 3) Generate text

```bash
python -m znap.cli generate \
  --model-dir runs/my_causal \
  --prompt "hello, who are you?" \
  --max-new-tokens 80
```

### 4) Chat (interactive)

```bash
python -m znap.cli chat --model-dir runs/my_causal --max-new-tokens 80
```

### 5) Fine-tune

```bash
python -m znap.cli finetune \
  --base-model-dir runs/my_causal \
  --data generated_datasets/znap_chat_train.jsonl \
  --output runs/my_causal_ft \
  --epochs 3 \
  --batch-size 32
```

## Transformer Training

```bash
python -m znap.cli train-transformer \
  --data generated_datasets/znap_chat_train.txt \
  --output runs/tf_phase2 \
  --tokenizer bytelevel \
  --preset tiny \
  --epochs 5 \
  --batch-size 16
```

## LoRA and Quantized Inference

LoRA fine-tuning:

```bash
python -m znap.cli train-lora \
  --base-model-dir runs/my_causal \
  --data generated_datasets/znap_chat_train.jsonl \
  --output runs/my_lora \
  --rank 4 --alpha 8 --epochs 3
```

Use LoRA adapters:

```bash
python -m znap.cli generate \
  --model-dir runs/my_causal \
  --lora-path runs/my_lora/lora_adapters.npz \
  --prompt "hello"
```

Quantized generation:

```bash
python -m znap.cli generate --model-dir runs/my_causal --prompt "hello" --quantize int8
python -m znap.cli generate --model-dir runs/my_causal --prompt "hello" --quantize int4
```

## Vision-Language Training (Scaffold)

Install image extras:

```bash
pip install -e .[image]
```

Dataset format (`.jsonl`):

```json
{"image":"path/to/image1.jpg","text":"describe this scene"}
{"image":"path/to/image2.png","text":"what object is visible?"}
```

Train:

```bash
python -m znap.cli train-vlm \
  --data data/vision_text.jsonl \
  --output runs/my_vlm \
  --context-size 32 \
  --image-size 32 \
  --epochs 5
```

Generate with image:

```bash
python -m znap.cli generate-vlm \
  --model-dir runs/my_vlm \
  --image path/to/image1.jpg \
  --prompt "answer:"
```

## Import HF-like Local Checkpoints

`import-hf` supports local `.npz + config.json` sources and converts them to znap transformer artifacts.

```bash
python -m znap.cli import-hf --source path/to/local_hf_npz_dir --output runs/imported_tf
```

## Project Layout

```text
znap/
  core/          # tensor + autograd engine
  nn/            # layers/modules
  optim/         # optimizers
  losses/        # losses
  data/          # dataset loading and batching
  tokenizers/    # whitespace, bpe, byte-level
  train/         # training/eval helpers
  models/        # tiny MLM and causal models
  transformer.py # decoder-only transformer
  cli.py         # command-line interface
```

## Development

Run tests:

```bash
python -m pytest -q
```

Build package:

```bash
python -m pip install --upgrade build twine
python -m build
python -m twine check dist/*
```

## Roadmap

- Better checkpoint compatibility and broader external model import support
- Improved docs with more practical fine-tuning recipes
- Expanded benchmarks and profiling outputs

## License

MIT License. See `LICENSE`.
