Metadata-Version: 2.4
Name: inclitoken
Version: 0.1.0
Summary: Inclitoken is implementation of Byte Pair Encoding Tokenizer from scratch.
Keywords: bpetokenizer,tokenizer,byte pair encoding
Author: Adarsh Dubey
Author-email: Adarsh Dubey <dubeyadarshmain@gmail.com>
License-Expression: MIT
License-File: LICENSE
Requires-Dist: tqdm>=4.67.1
Requires-Python: >=3.14
Project-URL: Repository, https://github.com/inclinedadarsh/inclitoken
Description-Content-Type: text/markdown

# IncliToken

A simple Byte Pair Encoding (BPE) tokenizer implementation from scratch in Python.

## Installation

```bash
uv add inclitoken
```

Or you can use `pip`:
```bash
pip install inclitoken
```

## Usage

```python
from inclitoken.tokenizer import BPETokenizer

# Initialize tokenizer
tokenizer = BPETokenizer()

# Train on your text
text = "Hello world! This is a simple example."
tokenizer.train(text, turns=100, verbose=False)

# Encode text to token IDs
ids = tokenizer.encode("Hello world!")
print(ids)

# Decode token IDs back to text
decoded = tokenizer.decode(ids)
print(decoded)
```

## Features

- Train custom BPE tokenizers on your text
- Encode text into token IDs
- Decode token IDs back into text
- Track merge operations and vocabulary

## Requirements

- Python >= 3.14
- tqdm

## Author

Built by [Adarsh Dubey](https://twitter.com/inclinedadarsh)
