Metadata-Version: 2.4
Name: charabia-py
Version: 0.1.0
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
License-File: LICENSE
Requires-Python: >=3.8
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM

# charabia_py

Python bindings to the tokenizer library [charabia](https://github.com/meilisearch/charabia)

See [example.py](./example.py) for examples of how to use it.

# DISCLAIMER

This is basically all written in one-shot using Cline, with the following prompt:

```
Hi, I'm trying to add a Python wrapper library to the Rust tokenizer library `charabia`. To do so, in this folder there are two directories:
    - `charabia`, containing the source code for the charabia library. this is for reference only, because in the wrapper we'll want to refer to `charabia` using the crate
    - `charabia-py`, the wrapper library using maturin and pyo3. In charabia-py we will likely want to add the charabia crate and then wrap the functionality. These are the most important data structures to look at:
     - the Tokenizer
     - the TokenizerBuilder 
     - the Token
```

I've done some basic testing and it all seems to work but use this at your own risk!


