Metadata-Version: 2.4
Name: strip-marks
Version: 1.0.0
Summary: Small Python library to strip non-spacing marks (e.g., diacritics) from a string
Author-email: "Jake W. Ireland" <jakewilliami@icloud.com>
License: MIT
Requires-Python: <4.0.0,>=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: ispunct>=1.0.0
Dynamic: license-file

# `strip_marks`

A small Python library for stripping non-spacing marks (e.g., diacritics; accents) from a string.

---

## Quick Start

```python
from strip_marks import strip_marks

assert strip_marks("şéàşöñ") == "season"
assert strip_marks("kaderdenkesişenyollarinhikayesi.xyz") == "kaderdenkesisenyollarinhikayesi.xyz"

def identity(x): return x
assert strip_marks("hello world") == identity("hello world")
```

## Using `strip_marks` as a Library

This package is published on PyPI.  You can install it with PIP:

```commandline
$ pip add strip_marks
```

Or, if using [UV](https://github.com/astral-sh/uv/) for dependency management:

```commandline
$ uv add strip-marks
```

## Notes on Internal Functionality

This library also implements (and uses internally) functions adapted from the `utf8proc` C library to handle characters of multiple codepoints.  We also use bitwise functionality from our sister package, [`ispunct`](https://github.com/jakewilliami/ispunct-py).

## Alternative Libraries

This was written mostly as a proof of concept.  A more developed library with this functionality is [`unidecode`](https://pypi.org/project/Unidecode/).  A curious reader may also be interested in [`unihandecode`/`pykakasi`](https://pypi.org/project/pykakasi/) or [`text-unidecode`](https://pypi.org/project/text-unidecode/).

## Citation

If your research depends on `strip_marks`, please consider giving us a formal citation: [`citation.bib`](./citation.bib).
