Metadata-Version: 2.4
Name: privatiser
Version: 0.5.0
Summary: Content anonymizer/pseudonymizer — redact sensitive data before sharing with AI
Author-email: Privatiser <admin@privatiser.net>
License-Expression: MIT
Project-URL: Homepage, https://privatiser.net
Project-URL: Documentation, https://privatiser.net
Project-URL: Repository, https://github.com/XionDot/privatiser-engine
Keywords: anonymize,privacy,redact,pii,secrets,ai
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Security
Classifier: Topic :: Utilities
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click>=8.0
Requires-Dist: pyperclip>=1.8
Provides-Extra: web
Requires-Dist: flask>=3.0; extra == "web"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Dynamic: license-file

# privatiser-engine

Open source anonymization engine powering [Privatiser](https://privatiser.net). Redacts IPs, API keys, secrets, PII, and cloud identifiers from any text - replacing them with structurally valid pseudonyms so context is preserved. Fully reversible.

**Everything runs locally. Nothing leaves the machine.**

Available as a **Python library/CLI** and a **browser-native JavaScript port**.

---

## What it detects

| Category | Examples | Pseudonym format |
|---|---|---|
| IP addresses | `192.168.1.100`, `10.0.0.0/16` | `10.x.x.x` (preserves CIDR) |
| Email addresses | `admin@company.com` | `user-1@redacted.example.net` |
| Domain names | `prod-db.mycompany.com` | `redacted-host-1.example.net` |
| MAC addresses | `AA:BB:CC:DD:EE:FF` | `AA:BB:CC:00:00:01` |
| AWS Account IDs | `123456789012` | `100000000001` |
| AWS ARNs | `arn:aws:iam::123...:role/admin` | Structure preserved, values redacted |
| S3 buckets | `s3://my-prod-bucket` | `s3://redacted-bucket-1` |
| API keys | AWS, OpenAI, Anthropic, Google, Groq, GitHub, Slack, Azure | `REDACTED_SECRET_n` |
| Connection strings | `postgresql://user:pass@host/db` | `REDACTED_CONNSTR_n` |
| JWT tokens | `eyJhbG...` | `REDACTED_JWT_n` |
| PEM private keys | `-----BEGIN RSA PRIVATE KEY-----` | `REDACTED_PEM_KEY_n` |
| Bearer tokens | `Authorization: Bearer sk-...` | `REDACTED_BEARER_n` |
| Generic secrets | `password = "value"` | Keyword preserved, value redacted |
| US phone numbers | `(555) 123-4567`, `+1-555-123-4567` | `(555) 000-0001` |
| UK phone numbers | `+44 7911 123456` | `+44 7700 900001` |
| Credit cards | `4111 1111 1111 1111` (Luhn validated) | `4000-0000-0000-0001` |
| US SSN | `123-45-6789` | `078-05-0001` |
| Passports | `C12345678` | `X00000001` |
| IBAN | `DE89370400440532013000` | `GB00XXXX000000000001` |
| UUIDs | `550e8400-e29b-41d4-...` | `00000000-0000-4000-a000-...` |
| Azure / GCP IDs | Subscription IDs, project IDs | Redacted with counter |

Skips well-known safe values: `127.0.0.1`, `0.0.0.0`, `localhost`, `amazonaws.com`, `github.com`, etc.

---

## Python

### Install

```bash
pip install privatiser
```

### Usage

```python
from privatiser import Privatiser

p = Privatiser()

text = 'server = "192.168.1.100"\npassword = "secret123"'
anonymized, mapping = p.anonymize(text)
# server = "10.0.1.8"
# password = "REDACTED_SECRET_1"

restored = p.deanonymize(anonymized, mapping)
assert restored == text  # perfect round-trip
```

### Category toggles

```python
p = Privatiser(enabled_categories={"pii": False})  # skip phone/card/SSN
```

### Allowlist

```python
p = Privatiser(allowlist=["localhost", "example.com"])  # never redact these
```

### Custom patterns

```python
from privatiser import Privatiser, register_custom

register_custom("ticket_id", r"TICKET-\d{4,6}", "REDACTED_TICKET_{n}")

p = Privatiser()
result, mapping = p.anonymize("Fix TICKET-12345")
# result: "Fix REDACTED_TICKET_1"
```

### CLI

```bash
# From stdin
cat config.tf | privatiser anonymize

# From file, save mapping
privatiser anonymize config.tf -o clean.tf -m mapping.json

# Restore
privatiser deanonymize clean.tf -m mapping.json

# Disable categories
privatiser anonymize config.tf -d pii -d aws
```

---

## JavaScript (browser / Node)

The `privatiser.js` file is a self-contained browser port with no dependencies. Drop it into any web project or use it in Node.

```html
<script src="privatiser.js"></script>
```

```javascript
const p = new Privatiser();
const { result, mapping } = p.anonymize(text);

// Restore
const restored = p.deanonymize(result, mapping);
```

### Options

```javascript
const p = new Privatiser({
  enabledCategories: { pii: false },          // disable a category
  allowlist: ["localhost", "example.com"],    // never redact these
  customWords: ["mycompany", "prod-server"],  // always redact these
});
```

---

## How it works

1. **Placeholder pass** - before any pattern runs, detected values are replaced with null-byte markers (`\x00PRIV_0\x00`). This prevents patterns from matching inside already-redacted values.
2. **Pattern priority** - patterns run highest-priority first (connection strings before passwords, JWTs before base64, etc.).
3. **Deterministic pseudonyms** - the same value always gets the same pseudonym within a session, so repeated occurrences stay consistent.
4. **Structural preservation** - pseudonyms match the format of the original (IPs look like IPs, emails look like emails) so downstream tools and AI models aren't confused.
5. **Restore pass** - `deanonymize()` does a simple string replacement of pseudonyms back to originals using the mapping.

---

## Project structure

```
src/privatiser/
  core.py          - Privatiser class, anonymize/deanonymize logic
  patterns/
    secrets.py     - API keys, JWTs, connection strings, PEM keys
    network.py     - IPs, domains, emails, MACs, URLs
    pii.py         - phone, credit card, SSN, passport, IBAN
    aws.py         - AWS account IDs, ARNs, S3 buckets
    cloud.py       - Azure, GCP identifiers
    identifiers.py - UUIDs, generic identifiers
  cli.py           - Click CLI entrypoint
  web/             - Flask web UI (optional)

privatiser.js      - Self-contained browser/Node JS port
tests/             - pytest test suite
```

---

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md). Pattern contributions are especially welcome - if you work with a format that Privatiser doesn't detect yet, opening a PR with a new pattern + tests is the fastest way to get it added.

---

## Attribution

MIT licensed - use it freely in personal and commercial projects. If you build something with it, a "Powered by [Privatiser](https://privatiser.net)" credit is appreciated but not required.

## License

MIT - see [LICENSE](LICENSE).

Built and maintained by [@XionDot](https://github.com/XionDot). Web tool at [privatiser.net](https://privatiser.net).
