Metadata-Version: 2.4
Name: blandify
Version: 0.1.0
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Programming Language :: Rust
Classifier: Topic :: Text Processing :: General
Summary: Unicode normalization for stripping LLM artifacts
Keywords: unicode,normalization,text-processing,ascii
Author-email: Moritz Wilksch <moritzwilksch@gmail.com>
License: MIT
Requires-Python: >=3.12
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM

# blandify (Python bindings)

Python bindings for the `blandify` Rust Unicode normalization library.

## What it does

`blandify.normalize(...)` replaces common Unicode artifacts with plain ASCII forms, including:

- smart quotes and apostrophes
- Unicode dashes and minus signs
- non-ASCII whitespace (including tab expansion to two spaces)
- zero-width and directional markers
- arrows, vulgar fractions, common math symbols, and common text symbols
- optional German umlaut transliteration (`ä -> ae`, `ö -> oe`, `ü -> ue`, `ß -> ss`)

## Development

From the repository root:

```bash
cd python
pixi run maturin develop --uv
pixi run pytest tests/
```

Or from the root with the configured task:

```bash
pixi run -e dev python-test
```

