Metadata-Version: 2.2
Name: vrox_lex
Version: 0.2.5
Summary: A fast, customizable lexer for tokenizing text using regex rules.
Author: Sakib Hasan
Author-email: your.email@example.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: requires-python
Dynamic: summary


# 📜 VroxLexer

`VroxLexer` is a simple yet powerful Python-based lexer designed to tokenize input text based on user-defined regular expression rules. It efficiently processes large files by reading in chunks and writing tokens directly to an output file.

  

### ✨ Features

- **Customizable Tokenization** – Users can define `custom tokenization rules` using regular expressions.
- **Efficient File Processing** – Processes large files efficiently by `reading in chunks` instead of loading the entire file into memory.
-  **Regex-based Matching** – Uses precompiled `regular expressions` for fast token detection.
- **Automatic Output Writing** – Tokens are written directly to an output file in real time.
- **Callback Support** – Allows users to specify a callback function that runs after tokenization is complete.
-  **Extensible API** – Easily add `custom token rules` for any syntax.

### 📦 Installation

```bash
pip install vrox-lex
```


### 🔗 Optional Dependency
For higher-level regex abstraction, you can install VroxRegex:
```python
pip install vrox-regex
```
📝 `Note`: VroxLexer works without VroxRegex, but using it simplifies complex regex definitions.

## 🚀 Usage

### 1️⃣ **Create a Lexer Instance** 
To start using `VroxLexer`, create an instance of it:

```python
    from vrox_lexer import VroxLexer
    lexer = VroxLexer()
```

### 2️⃣ **Add Tokenization Rules**
Use the `add_rule(name, regex)` method to define token rules.

**Method 1:** One-by-One Rule Addition
```python
lexer.add_rule("KEYWORD", r"\b(if|else|while|for|return)\b")
lexer.add_rule("NUMBER", r"\b\d+(\.\d+)?\b")
lexer.add_rule("STRING", r"\".*?\"|'.*?'")
lexer.add_rule("IDENTIFIER", r"\b[a-zA-Z_][a-zA-Z0-9_]*\b")
lexer.add_rule("OPERATOR", r"[+\-*/=]")
lexer.add_rule("PUNCTUATION", r"[\(\)\{\},;]")
```

**Method 2:** Chainable API for Better Readability `(Recommended)`

```python
lexer = VroxLexer()\
        .add_rule("KEYWORD", r"\b(if|else|while|for|return)\b")\
        .add_rule("NUMBER", r"\b\d+(\.\d+)?\b")\
        .add_rule("STRING", r"\".*?\"|'.*?'")\
        .add_rule("IDENTIFIER", r"\b[a-zA-Z_][a-zA-Z0-9_]*\b")\
        .add_rule("OPERATOR", r"[+\-*/=]")\
        .add_rule("PUNCTUATION", r"[\(\)\{\},;]")
```

### 3️⃣ **Tokenize Text from a File**

```python
def on_tokenization_complete(output_path):
    print(f"✔ Tokenization complete! Tokens saved in {output_path}")

lexer.tokenize("input.txt", "output.txt", on_tokenization_complete)
```

### 📢 **Example Output (output.txt)**

```txt
KEYWORD: if
IDENTIFIER: count
OPERATOR: =
NUMBER: 10
PUNCTUATION: ;
```

### 🔗 **Final Notes**

- VroxLexer is lightweight, fast, and designed for extensibility.
- Using precompiled regex improves performance significantly.
- The chainable API makes rule definition more intuitive.
- The callback function enables post-processing after tokenization.
