Metadata-Version: 2.4
Name: django-eose
Version: 0.1.0b2
Summary: Django Encrypted Object Search Engine.
Author-email: Paulo Otávio Castoldi <paulocastoldi@paulocastoldi.com.br>
License: MIT
Project-URL: Homepage, https://gitlab.com/paulo_castoldi/django-eose
Keywords: django,search,multiprocessing,decrypted,cache
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Framework :: Django
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: Django>=6.0.3
Requires-Dist: sqlparse>=0.5.5
Requires-Dist: asgiref>=3.11.1
Requires-Dist: psutil>=7.2.2
Requires-Dist: cffi>=2.0.0
Requires-Dist: cryptography>=46.0.5
Requires-Dist: pycparser>=3.0
Provides-Extra: dev
Requires-Dist: setuptools>=82.0.1; extra == "dev"
Requires-Dist: wheel>=0.46.3; extra == "dev"
Dynamic: license-file

# Django EOSE

**Django Encrypted Object Search Engine**

Provides highly optimized, parallelized search capabilities over Django querysets, including support for decrypted on-the-fly field searches. `django-eose` is ideal for scenarios where you need to search data that is encrypted in the database, delivering high performance even on massive datasets through smart memory management and multiprocessing.

---

## Key Features

- **Parallel Execution Strategies:** Choose between `processes` (optimal for CPU-bound tasks like decryption), `threads` (optimal for I/O-bound tasks), or `sync`.
- **Smart Memory Batching:** Automatically calculates and adapts batch sizes based on your system's available memory and the estimated average size of the objects.
- **Deep Relation Searching:** Seamlessly search across related object paths (e.g., `order__client`).
- **Decryption On-The-Fly:** Bypasses standard Django model instantiation overhead by using `.values_list()` directly with Fernet decryption for massive speed gains.
- **Result Caching:** Generates stable SHA-1 cache keys for querysets to cache frequently searched results, reducing database hits.

---

## Installation

Install easily via `pip`:

```bash
pip install django-eose
```

## Requirements

- Python 3.10+
- Django 6.0.3
- sqlparse 0.5.5
- asgiref 3.11.1
- psutil 7.2.2
- cffi 2.0.0
- cryptography 46.0.5
- pycparser 3.0

---

## Model Configuration

There are two ways to use `django-eose`:

1. **Direct Decryption (Recommended):** Let `django-eose` handle the Fernet decryption internally. This is significantly faster as it operates on raw database values.
2. **Django Getters:** Use custom model properties that return decrypted values.

If you plan to use the first option, you can skip creating the properties below.

### Example Model (Fernet Encryption)

```python
from django.db import models
from cryptography.fernet import Fernet

AES_KEY = b"<your_key_here>"

class Client(models.Model):
    _encrypted_name = models.BinaryField()
    _encrypted_email = models.BinaryField()

    # Method to decrypt the database value
    def _decrypt_field(self, encrypted_value):
        return Fernet(AES_KEY).decrypt(encrypted_value).decode()

    # Method to encrypt the value before saving
    def _encrypt_field(self, value):
        return Fernet(AES_KEY).encrypt(value.encode())

    # Creates properties that transparently handle encryption/decryption
    @staticmethod
    def _property(field_name):
        def getter(self):
            return self._decrypt_field(getattr(self, field_name))

        def setter(self, value):
            setattr(self, field_name, self._encrypt_field(value))

        return property(getter, setter)

    # Fields accessible as normal attributes
    name = _property('_encrypted_name')
    email = _property('_encrypted_email')
```

⚠️ You interact with `name` and `email` like regular fields, while encryption/decryption happens transparently.

---

## Usage

First, add your AES password to your `.env` file so the internal processors can build the decryption key:

```bash
AES_PASSWORD=your-password-here
```

### Option A: High-Performance Direct Decryption (`decrypt=True`)

This method ignores `only_fields` and uses `.values_list()` for maximum I/O efficiency.

```python
from django_eose import search_queryset
from orders.models import OrderItem

# Search for "john" in related client fields using direct AES decryption
results = search_queryset(
    search="john",
    queryset=OrderItem.objects.all(),
    related_field="order__client",
    fields=("_encrypted_name", "_encrypted_email"),
    executor="processes",
    max_batch_size=1_000_000,
    decrypt=True
)
# Returns a filtered queryset: queryset.filter(pk__in=matched_ids)
```

### Option B: Using Django Model Properties

This method relies on the model's properties to handle decryption. Use `only_fields` to restrict the data loaded into memory.

```python
from django_eose import search_queryset
from orders.models import OrderItem

# Search for "john" using the model's custom getters
results = search_queryset(
    search="john",
    queryset=OrderItem.objects.all(),
    related_field="order__client",
    fields=("name", "email"),
    only_fields=("_encrypted_name", "_encrypted_email"),
    executor="processes",
    max_batch_size=1_000_000
)
# Returns a filtered queryset: queryset.filter(pk__in=matched_ids)
```

---

## `search_queryset` API Reference

The core function accepts several parameters to tune performance:

- **`search`** _(str)_: The term to search for. It will be automatically normalized (accents removed) and lowercased.
- **`queryset`** _(Any)_: The initial Django QuerySet to filter.
- **`related_field`** _(str | None)_: The relation path to traverse before checking fields (e.g., `"order__client"`). Defaults to `None` (searches the base object).
- **`fields`** _(tuple[str] | None)_: Tuple of field names to inspect for the search term.
- **`only_fields`** _(tuple[str] | None)_: Tuple of fields to load via `.only()` to optimize database I/O when `decrypt=False`.
- **`executor`** _(str)_: Execution engine to use. Choices are `"processes"` (default, best for CPU/decryption), `"threads"` (best for I/O), or `"sync"`.
- **`cache_timeout`** _(int)_: Seconds to cache the found primary keys. Default is `600`.
- **`imap_chunksize`** _(int)_: The size of chunks sent to each worker process/thread.
- **`memory_fraction`** _(float)_: Fraction of available system memory to allocate for calculating dynamic batch sizes.
- **`avg_obj_size_bytes`** _(int | None)_: Hardcoded estimated average object size. If `None`, the engine samples 10 objects to estimate this automatically.
- **`max_workers`** _(int | None)_: Number of parallel workers. If `None`, defaults to `multiprocessing.cpu_count()`.
- **`max_batch_size`** _(int | None)_: The upper boundary for dynamic object batching. Automatically reduced if it exceeds available memory constraints.
- **`decrypt`** _(bool)_: If `True`, decrypts data using Fernet directly on raw byte payloads. Faster than using Django getters.

---

## Default Settings

`django-eose` utilizes sensible defaults to protect system resources. These are defined in `django_eose.settings.DEFAULTS`:

| Setting                     | Default Value | Description                                                             |
| :-------------------------- | :------------ | :---------------------------------------------------------------------- |
| **`MEMORY_FRACTION`**       | `0.60`        | Percentage of available system memory targeted for batch loading (60%). |
| **`IMAP_CHUNKSIZE`**        | `10_240`      | Size of the chunk sent at a time to each worker.                        |
| **`EXECUTOR`**              | `"processes"` | Default parallelization strategy.                                       |
| **`CACHE_TIMEOUT`**         | `600`         | Default cache TTL in seconds (10 minutes).                              |
| **`AVG_OBJ_SIZE_FALLBACK`** | `4096`        | Fallback size in bytes (4KB) if object size estimation fails.           |
| **`MIN_BATCH_SIZE`**        | `1_000`       | The absolute minimum number of objects to load per batch.               |
| **`MAX_BATCH_SIZE`**        | `1_000_000`   | The absolute maximum number of objects to load per batch.               |

---

## License

MIT © 2025 Paulo Otávio Castoldi

## Links

[Source](https://gitlab.com/paulo_castoldi/django-eose)
