Metadata-Version: 2.4
Name: django-eose
Version: 0.5.1b1
Summary: Django Encrypted Object Search Engine.
Author-email: Paulo Otávio Castoldi <paulocastoldi@paulocastoldi.com.br>
License: MIT
Project-URL: Homepage, https://gitlab.com/paulo_castoldi/django-eose
Keywords: django,search,multiprocessing,decrypted,cache
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Framework :: Django
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: Django>=6.0.3
Requires-Dist: sqlparse>=0.5.5
Requires-Dist: asgiref>=3.11.1
Requires-Dist: psutil>=7.2.2
Requires-Dist: cffi>=2.0.0
Requires-Dist: cryptography>=46.0.5
Requires-Dist: pycparser>=3.0
Provides-Extra: dev
Requires-Dist: setuptools>=82.0.1; extra == "dev"
Requires-Dist: wheel>=0.46.3; extra == "dev"
Dynamic: license-file

# Django EOSE

**Django Encrypted Object Search Engine**

Provides highly optimized, heavily parallelized search capabilities over encrypted Django querysets. `django-eose` is strictly designed for Linux environments, delivering high performance even on massive datasets through dynamic memory management, multiprocess forks, and direct tuple value extraction.

---

## Key Features

- **Strictly Linux Multiprocessing:** Relies exclusively on the Linux 'fork' context and Copy-On-Write (CoW) memory sharing. This completely eliminates data duplication and pickling overhead across workers.
- **Dynamic Memory Batching:** Automatically estimates available RAM via psutil and calculates safe batch sizes to prevent out-of-memory (OOM) errors during massive database scans.
- **Dynamic Chunksize & CPU Load Balancing:** Automatically computes the perfect chunksize to balance inter-process communication (IPC) overhead against CPU utilization based on available cores.
- **Decryption On-The-Fly:** Bypasses standard Django model instantiation entirely. It uses `.values_list()` directly with Fernet decryption for maximum speed.
- **Zero Memory Leaks:** Explicit garbage collection (gc.collect()) is enforced after each batch, and worker processes are strictly terminated and respawned (`maxtasksperchild=1`) to guarantee memory is freed.
- **Result Caching:** Generates stable SHA-1 cache keys for querysets to cache frequently searched results, reducing database hits.

---

## Installation

Install easily via pip:

```bash
pip install django-eose
```

## Requirements

- Python 3.10+
- Django 6.0.3
- psutil 7.2.2
- cryptography 46.0.5
- _(Must be deployed on a Linux OS for multiprocessing compatibility)_

---

## Model Configuration

To understand how the related searches work, here is a practical example of the database structure.
We use BinaryField to store the encrypted data, and ForeignKeys to establish the relationships.

```python
from django.db import models
from cryptography.fernet import Fernet

AES_KEY = b"<your_key_here>"

class Client(models.Model): # Encrypted fields stored as raw bytes
    _encrypted_name = models.BinaryField()
    _encrypted_email = models.BinaryField()

    # Method to decrypt the database value
    def _decrypt_field(self, encrypted_value):
        return Fernet(AES_KEY).decrypt(encrypted_value).decode()

    # Method to encrypt the value before saving
    def _encrypt_field(self, value):
        return Fernet(AES_KEY).encrypt(value.encode())

    # Creates properties that transparently handle encryption/decryption
    @staticmethod
    def _property(field_name):
        def getter(self):
            return self._decrypt_field(getattr(self, field_name))

        def setter(self, value):
            setattr(self, field_name, self._encrypt_field(value))

        return property(getter, setter)

    # Fields accessible as normal attributes in Django
    name = _property('_encrypted_name')
    email = _property('_encrypted_email')

class Order(models.Model): # The client who placed the order
    client = models.ForeignKey(Client, on_delete=models.CASCADE, related_name="orders")
    created_at = models.DateTimeField(auto_now_add=True)

class OrderItem(models.Model): # The specific item belonging to an order
    order = models.ForeignKey(Order, on_delete=models.CASCADE, related_name="items")
    product_name = models.CharField(max_length=255)
    price = models.DecimalField(max_digits=10, decimal_places=2)
```

---

## Usage

First, add your AES password to your `.env` file so the internal processors can build the decryption key:

```bash
AES_PASSWORD=your-password-here
```

### High-Performance Direct Search

The API has been drastically simplified. You no longer need to pass manual limits or executor types. django-eose automatically handles CPU core distribution and memory allocation.

```python
from django_eose import search_queryset
from orders.models import OrderItem
```

### Search for "john" in related client fields using direct AES decryption

```python
results = search_queryset(
    search="john",
    queryset=OrderItem.objects.all(),
    related_field="order__client",
    fields=("_encrypted_name", "_encrypted_email"),
    cache_timeout=3600 # Optional: Overrides default TTL
)
```

**Returns a standard filtered Django queryset: `queryset.filter(pk__in=matched_ids)`**

---

## search_queryset API Reference

The core function dynamically scales to your hardware and accepts the following parameters:

- **search** (str): The term to search for. It will be automatically normalized (accents and punctuation removed) and lowercased.
- **queryset** (Any): The initial Django QuerySet to filter.
- **related_field** (str | None): The relation path to traverse before checking fields (e.g., "order\_\_client"). Defaults to None (searches the base object).
- **fields** (tuple[str] | None): Tuple of raw database field names to extract and decrypt.
- **cache_timeout** (int | None): Seconds to cache the found primary keys. If None, it uses the global setting `DDS_CACHE_TIMEOUT`.

---

## Configuration via Django Settings

`django-eose` utilizes sensible defaults to protect system resources, but allows full customization in your main Django `settings.py` file using the `DDS_` prefix.

| Setting                       | Default Value | Description                                                                           |
| :---------------------------- | :------------ | :------------------------------------------------------------------------------------ |
| **DDS_MEMORY_FRACTION**       | 0.60          | Target fraction of available system RAM (60%) to utilize during large search batches. |
| **DDS_CACHE_TIMEOUT**         | 600           | Default cache TTL in seconds (10 minutes).                                            |
| **DDS_AVG_OBJ_SIZE_FALLBACK** | 4096          | Fallback size in bytes (4KB) if dynamic object size estimation fails.                 |
| **DDS_MIN_BATCH_SIZE**        | 1024          | Minimum number of objects to load per memory batch.                                   |
| **DDS_MAX_BATCH_SIZE**        | 1024000       | Maximum upper boundary for dynamic object batching.                                   |
| **DDS_MIN_CHUNKSIZE**         | 256           | Minimum number of items assigned to a worker per map iteration.                       |
| **DDS_MAX_CHUNKSIZE**         | 15360         | Maximum chunksize boundary to prevent IPC bottlenecks.                                |

---

## License

MIT | 2026 Paulo Otávio Castoldi

## Links

[Source](https://gitlab.com/paulo_castoldi/django-eose)
