Metadata-Version: 2.4
Name: stringshift
Version: 4.0.0
Summary: Advanced string encoding / decoding toolkit — 24 formats, auto-detection, deep decode, pipelines, plugins
Author: DevBullions
Project-URL: Homepage, https://github.com/DevBullions/stringshift
Project-URL: Repository, https://github.com/DevBullions/stringshift
Project-URL: Issues, https://github.com/DevBullions/stringshift/issues
Keywords: encoding,decoding,base64,hex,morse,cipher,ctf,security,crypto,text,pipeline,braille
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Security :: Cryptography
Classifier: Topic :: Text Processing
Classifier: Topic :: Utilities
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Provides-Extra: full
Requires-Dist: chardet>=5.0; extra == "full"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: chardet>=5.0; extra == "dev"

# stringshift

![Python](https://img.shields.io/badge/python-3.8%2B-blue)
![License](https://img.shields.io/badge/license-MIT-green)
[![PyPI](https://img.shields.io/pypi/v/stringshift)](https://pypi.org/project/stringshift/)
![Formats](https://img.shields.io/badge/formats-24%20built--in-orange)

**Advanced string encoding / decoding toolkit for Python.**

24 built-in formats · Auto-detection engine · Deep multi-layer unwrapping ·
Operation pipelines · Runtime plugin system · Full CLI · Zero dependencies

---

## What's new in v4.0

- **16 new formats** — `base16`, `base58`, `base85`, `ascii85`, `rot47`, `nato`, `braille`,
  `caesar`, `atbash`, `vigenere`, `xor`, `reverse`, `unicode_escape`, `punycode`,
  `quoted_printable`, `uuencode`
- **`magic_decode`** — rank every possible interpretation of an unknown string
- **`smart_decode`** — one-call auto-detection and decoding
- **`deep_decode`** — recursively unwrap multi-layer encodings (like CyberChef)
- **`pipeline`** — chain encode/decode operations in sequence
- **Plugin system** — register custom codecs at runtime
- **Proper exceptions** — `DecodeError`, `EncodeError`, `UnknownFormatError`, `PipelineError`
- Bug fixes: `exceptions.py` now contains actual exceptions, all functions properly exported

---

## Installation

```bash
pip install stringshift

# Optional: smarter byte-encoding detection
pip install "stringshift[full]"
```

---

## Quick Start

```python
import stringshift

# Encode & decode
stringshift.encode("hello", "base64")       # 'aGVsbG8='
stringshift.decode("aGVsbG8=", "base64")    # 'hello'

# Don't know what something is? Auto-detect it
stringshift.smart_decode("SGVsbG8=")        # 'Hello'

# See every possible interpretation, ranked by confidence
stringshift.magic_decode("SGVsbG8=")
# [{'format': 'base64', 'confidence': 0.89, 'decoded': 'Hello'}, ...]

# Unwrap multi-layer encodings in one call
stringshift.deep_decode("SGVsbG8%3D")
# {
#   'result': 'Hello',
#   'total_layers': 2,
#   'layers': [
#     {'layer': 1, 'format': 'url',    'value': 'SGVsbG8='},
#     {'layer': 2, 'format': 'base64', 'value': 'Hello'}
#   ]
# }

# Chain operations
stringshift.pipeline("hello", ["base64_encode", "url_encode"])  # 'aGVsbG8%3D'
```

---

## Supported Formats (24 built-in)

| Category        | Formats |
|-----------------|---------|
| Base encodings  | `base64` `base32` `base16` `base58` `base85` `ascii85` |
| Binary / Hex    | `hex` `binary` |
| Web / Text      | `url` `html` `quoted_printable` `punycode` `uuencode` |
| Classic ciphers | `caesar` `atbash` `vigenere` `rot13` `rot47` `xor` |
| Symbol / Human  | `morse` `nato` `braille` |
| Misc            | `reverse` `unicode_escape` |

---

## CLI

Install and the `stringshift` command is available immediately.

```bash
# Auto-detect and decode
$ stringshift "SGVsbG8="
Hello

# Encode
$ stringshift "Hello" -e base64
SGVsbG8=

# Decode a specific format
$ stringshift "48656c6c6f" -d hex
Hello

# Show all possible interpretations with confidence scores
$ stringshift "SGVsbG8=" --magic
Confidence   Format             Decoded
------------------------------------------------------
89%          base64             Hello

# Unwrap multi-layer encodings (like CyberChef)
$ stringshift "SGVsbG8%3D" --deep
Layer  1  [url             ]  SGVsbG8=
Layer  2  [base64          ]  Hello

Final result: Hello

# Chain operations via pipeline
$ stringshift "hello" --pipeline base64_encode url_encode
aGVsbG8%3D

# Cipher options
$ stringshift "Hello" -e caesar --shift 3      # Khoor
$ stringshift "Hello" -e vigenere --key secret
$ stringshift "Hello" -e xor --xor-key 99

# Batch process — one item per line from stdin
$ echo -e "aGVsbG8=\nd29ybGQ=" | stringshift --batch -d base64
Hello
world

# Process a file
$ stringshift -i encoded.txt -d base64 > decoded.txt

# List every available format
$ stringshift --list

# Benchmark processing time
$ stringshift "SGVsbG8=" --benchmark

# Interactive mode (no arguments)
$ stringshift
stringshift 4.0.0  —  interactive mode
Commands:  encode <fmt> <text>
           decode <fmt> <text>
           magic  <text>
           deep   <text>
           list
stringshift>
```

---

## Python API

### Encode & Decode

```python
import stringshift

stringshift.encode("hello", "hex")                            # '68656c6c6f'
stringshift.encode("Hello", "caesar", shift=3)                # 'Khoor'
stringshift.encode("Hello", "vigenere", key="secret")         # 'Zinlc'
stringshift.encode("SOS", "morse")                            # '... --- ...'
stringshift.encode("ABC", "nato")                             # 'Alpha Bravo Charlie'
stringshift.encode("hi", "braille")                           # '⠓⠊'

stringshift.decode("68656c6c6f", "hex")                       # 'hello'
stringshift.decode("Khoor", "caesar", shift=3)                # 'Hello'
stringshift.decode("Zinlc", "vigenere", key="secret")         # 'Hello'
```

You can also call individual format functions directly:

```python
from stringshift import encode_base64, decode_morse, encode_braille
encode_base64("hello")           # 'aGVsbG8='
decode_morse("... --- ...")      # 'SOS'
encode_braille("hello")          # '⠓⠑⠇⠇⠕'
```

### Auto-Detection

```python
# Best guess — returns a single string
stringshift.smart_decode("68656c6c6f")         # 'hello'
stringshift.smart_decode("hello%20world")      # 'hello world'
stringshift.smart_decode("... --- ...")        # 'SOS'

# All candidates, ranked by confidence
results = stringshift.magic_decode("SGVsbG8=")
for r in results:
    print(f"{r['confidence']:.0%}  {r['format']:15s}  {r['decoded']}")

# Detection only — no decoding
results = stringshift.detect_format("SGVsbG8=")
# [{'format': 'base64', 'confidence': 0.89, 'decoded': 'Hello'}]
```

### Deep Decode

Automatically peels every encoding layer off a string, the same way
CyberChef's "Magic" operation works.

```python
# Two layers: url → base64
info = stringshift.deep_decode("SGVsbG8%3D")
print(info["result"])          # 'Hello'
print(info["total_layers"])    # 2
for layer in info["layers"]:
    print(layer["layer"], layer["format"], layer["value"])
# 1  url     SGVsbG8=
# 2  base64  Hello

# Three layers: url → base64 → hex
tripled = stringshift.encode(
    stringshift.encode(stringshift.encode("Hi", "hex"), "base64"),
    "url"
)
info = stringshift.deep_decode(tripled)
print(info["result"])         # 'Hi'
print(info["total_layers"])   # 3
```

### Pipeline

Chain any number of encode/decode steps. Each step must end with
`_encode` or `_decode`. Pass a tuple to include kwargs for ciphers.

```python
# Simple chain
result = stringshift.pipeline("hello", [
    "base64_encode",
    "url_encode",
])
# 'aGVsbG8%3D'

# Reverse it
stringshift.pipeline(result, ["url_decode", "base64_decode"])
# 'hello'

# With cipher kwargs
stringshift.pipeline("hello", [
    ("caesar_encode", {"shift": 5}),
    "base64_encode",
    "url_encode",
])
```

### Batch Processing

All batch functions use a thread pool internally and scale automatically
to your CPU count.

```python
texts = ["hello", "world", "foo"]

# Encode all in parallel
stringshift.batch_process(texts, operation="encode", fmt="base64")
# ['aGVsbG8=', 'd29ybGQ=', 'Zm9v']

# Decode all — explicit format
encoded = [stringshift.encode(t, "hex") for t in texts]
stringshift.batch_process(encoded, operation="decode", fmt="hex")
# ['hello', 'world', 'foo']

# Decode all — auto-detect format per item
mixed = ["SGVsbG8=", "68656c6c6f", "hello%20world"]
stringshift.batch_process(mixed)
# ['Hello', 'hello', 'hello world']

# Control worker threads
stringshift.batch_process(texts, operation="encode", fmt="base64", workers=8)
```

### Plugin System

Register your own codec at runtime. It immediately becomes available to
`encode()`, `decode()`, `pipeline()`, the CLI, and `list_formats()`.

```python
# Simple functional style
stringshift.register_codec(
    "shout",
    encoder=str.upper,
    decoder=str.lower,
)
stringshift.encode("hello", "shout")    # 'HELLO'
stringshift.decode("HELLO", "shout")    # 'hello'

# Class decorator style — cleaner for complex codecs
@stringshift.codec("reverse_words")
class ReverseWords:
    def encode(self, text: str) -> str:
        return " ".join(word[::-1] for word in text.split())
    def decode(self, text: str) -> str:
        return self.encode(text)   # self-inverse

stringshift.encode("hello world", "reverse_words")   # 'olleh dlrow'

# Use in a pipeline
stringshift.pipeline("hello world", [
    "reverse_words_encode",
    "base64_encode",
])

# See all formats including plugins
stringshift.list_formats()
# {'builtin': ['ascii85', 'atbash', 'base16', ...], 'plugins': ['shout', 'reverse_words']}
```

### Error Handling

```python
from stringshift import (
    DecodeError, EncodeError,
    UnknownFormatError, PipelineError,
)

# Bad input for a known format
try:
    stringshift.decode("not!!valid!!", "base64")
except stringshift.DecodeError as exc:
    print(exc.original)    # the input that failed
    print(exc.error)       # the underlying exception

# Requesting a format that doesn't exist
try:
    stringshift.encode("hello", "made_up")
except stringshift.UnknownFormatError as exc:
    print(exc.fmt)         # 'made_up'
    print(exc.available)   # full list of valid format names

# Pipeline step failure
try:
    stringshift.pipeline("hello", ["badstep"])
except stringshift.PipelineError as exc:
    print(exc.step)        # 'badstep'
    print(exc.index)       # 0  (position in the pipeline)

# Auto-detect on truly unrecognisable input
try:
    stringshift.smart_decode("!@#$%^&*()")
except stringshift.DecodeError:
    print("Could not determine encoding")
```

### Legacy Helpers (v1 compatible)

These functions are kept for backward compatibility.

```python
# decode_all: applies URL + HTML + escape-sequence decoding in one pass
stringshift.decode_all("hello%20world")              # 'hello world'
stringshift.decode_all("&lt;b&gt;hi&lt;/b&gt;")     # '<b>hi</b>'
stringshift.decode_all("\\x", fallback="[error]")    # '[error]'  ← invalid escape

# Normalise Unicode
stringshift.normalize_text("café", "NFC")
stringshift.normalize_text("café", "NFD")

# Parallel decode_all over a list
stringshift.batch_decode(["hello%20world", "&amp;foo"])
# ['hello world', '&foo']
```

---

## Format Reference

| Format             | Encode: `"Hi"` →            | Notes |
|--------------------|-----------------------------|-------|
| `base64`           | `SGk=`                      | RFC 4648, auto-padded |
| `base32`           | `JBQQ====`                  | uppercase alphabet |
| `base16`           | `4869`                      | uppercase hex |
| `base58`           | `9Ajd`                      | Bitcoin alphabet, no 0/O/I/l |
| `base85`           | `LrF`                       | Python `base64.b85encode` |
| `ascii85`          | `9jqo^`                     | Adobe variant |
| `hex`              | `4869`                      | lowercase, strips `0x`/spaces/colons on decode |
| `binary`           | `01001000 01101001`          | 8-bit groups, space-separated |
| `url`              | `Hi`  (`"Hi!"` → `Hi%21`)  | `quote(safe="")` |
| `html`             | `Hi`  (`"<b>"` → `&lt;b&gt;`) | full entity escaping |
| `quoted_printable` | `Hi`                        | email-safe encoding |
| `punycode`         | `caf-dma` (for `café`)      | IDN domain encoding |
| `uuencode`         | `*2&D`                      | classic Unix transfer encoding |
| `rot13`            | `Uv`                        | letter-only, self-inverse |
| `rot47`            | `w6`                        | all printable ASCII, self-inverse |
| `caesar`           | `Jk` (shift=1)              | kwarg: `shift` (default 13) |
| `atbash`           | `Sr`                        | A↔Z substitution, self-inverse |
| `vigenere`         | `Rr` (key="k")              | kwarg: `key` (default "key") |
| `xor`              | `62 43`                     | kwarg: `key` int 0-255 (default 42) |
| `morse`            | `.... ..`                   | dots, dashes, `/` for space |
| `nato`             | `Hotel India`               | full NATO phonetic alphabet |
| `braille`          | `⠓⠊`                        | Grade 1 Braille |
| `unicode_escape`   | `\u0048\u0069`              | `\uXXXX` / `\xXX` sequences |
| `reverse`          | `iH`                        | self-inverse |

---

## Running Tests

```bash
pip install pytest
pytest tests/ -v
```

---

## Contributing

Pull requests are welcome. To add a new codec:

1. Add `encode_<name>` and `decode_<name>` functions to `core.py`
2. Register them in `ENCODE_REGISTRY` and `DECODE_REGISTRY`
3. Add a round-trip test in `tests/test_core.py`
4. Update the format table in this README

---

## License

MIT — free for personal and commercial use.
