Metadata-Version: 2.4
Name: aeid
Version: 0.1.0
Summary: Abjad Encoded ID – 128-bit IDs encoded as 32 consonant-only ASCII characters
Project-URL: Documentation, https://github.com/nonogaki/aeid
Project-URL: Repository, https://github.com/nonogaki/aeid
Author: nonogaki
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries
Classifier: Typing :: Typed
Requires-Python: >=3.13
Requires-Dist: python-ulid>=3
Requires-Dist: typing-extensions>=4
Provides-Extra: pydantic
Requires-Dist: pydantic>=2; extra == 'pydantic'
Description-Content-Type: text/markdown

# AEID: Abjad Encoded ID

Consonant-only encoding for 128-bit identifiers (ULID/UUID), designed for LLM contexts.

AEID encodes 128-bit values as **32-character ASCII strings** using only **16 consonant letters** (`B C D F G H K L M N P R S T V W`). No vowels means no accidental English words, eliminating semantic bias when LLMs handle IDs.

```
ULID:  01KC9R38FJH1XRVWJZBJRRCV5A
AEID:  BCNRCFMCPCWDMMLRMTWDHWHSRCMKKSPP
```

## Why?

| Problem | AEID Solution |
|---|---|
| LLMs hallucinate ID characters | 32-char redundancy enables LCS-based error recovery |
| Alphanumeric IDs form English words | Vowel-free alphabet makes word formation impossible |
| Non-Latin scripts destabilize LLM output | ASCII-only, no language switching |
| Rare characters waste tokens | All 256 byte-pairs are single BPE tokens |

## Install

```bash
pip install aeid                # core
pip install aeid[pydantic]      # + Pydantic support
```

## Usage

### AEID class (extends ULID)

```python
from aeid import AEID

# Generate
a = AEID()                                          # new with current timestamp
a = AEID.from_str("BCNRCFMCPCWDMMLRMTWDHWHSRCMKKSPP")  # from AEID string
a = AEID.from_str("01KC9R38FJH1XRVWJZBJRRCV5A")        # from ULID string

# Convert
str(a)          # "BCNRCFMC..."   AEID (32 chars)
a.ulid_str      # "01KC9R38..."   ULID (26 chars)
a.datetime      # datetime(...)   timestamp
a.to_uuid()     # UUID(...)
int(a)          # 128-bit integer

# All ULID factory methods work
AEID.from_datetime(dt)
AEID.from_timestamp(time.time())
AEID.from_uuid(uuid_obj)
AEID.from_int(value)
AEID.parse(any_format)           # auto-detect AEID/ULID/hex/bytes/int
```

### Low-level functions

```python
from aeid import encode, decode, resolve

encode(0x019b1381a1f2887b8df25f5cb1866caa)
# "BCNRCFMCPCWDMMLRMTWDHWHSRCMKKSPP"

decode("BCNRCFMCPCWDMMLRMTWDHWHSRCMKKSPP")
# 0x019b1381a1f2887b8df25f5cb1866caa

# LCS-based error recovery
resolve(corrupted_id, candidate_list)
```

### Pydantic

```python
from pydantic import BaseModel
from aeid import AEID

class Record(BaseModel):
    id: AEID

r = Record(id="BCNRCFMCPCWDMMLRMTWDHWHSRCMKKSPP")  # validates AEID
r = Record(id="01KC9R38FJH1XRVWJZBJRRCV5A")         # validates ULID too
r.model_dump_json()  # {"id": "BCNRCFMCPCWDMMLRMTWDHWHSRCMKKSPP"}
```

## Encoding

Each byte is mapped to a 2-character pair using a 256-entry lookup table:

```
byte value  ->  ALPHABET[high nibble] + ALPHABET[low nibble]
0x00 -> BB,  0x01 -> BC,  ...  0xFF -> WW
```

16 bytes (128 bits) produce 32 characters. The encoding preserves sort order: lexicographic order of AEID strings matches the numeric order of the underlying values.

## Specification

- [English](doc/AEID_spec.md)
- [Japanese](doc/AEID_spec.ja.md)

## License

MIT
