Metadata-Version: 2.4
Name: llm-compact-serializer
Version: 0.1.0
Summary: Dynamic serializer for LLM Compact Hierarchical Formats.
License: MIT
License-File: LICENSE
Author: Jose Luis Guerra Infante
Author-email: sora_ryu@hotmail.com
Requires-Python: >=3.9,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Dist: google-generativeai (>=0.8.5,<0.9.0)
Requires-Dist: pydantic (>=2.0,<3.0)
Requires-Dist: python-dotenv (>=1.2.1,<2.0.0)
Description-Content-Type: text/markdown

# LLM Compact Serializer

**A dynamic, schema-driven Python library designed to drastically reduce LLM token usage by compressing complex JSON objects into a strict, hierarchical format.**

## 🚀 Why Use This?

When building LLM applications (using GPT-4, Gemini, Claude, etc.), passing large JSON arrays in the system prompt consumes a massive amount of tokens due to repeated keys and whitespace.

**Standard JSON (Heavy):**
```json
[
  {"product": "Apple", "price": "1.20", "category": "Fruit"},
  {"product": "Banana", "price": "0.80", "category": "Fruit"}
]
```

## Compact Protocol (Efficient):

```
|{Apple, 1.20, Fruit}|;|{Banana, 0.80, Fruit}|
```

## Impact:

Reduces token count from ~55 tokens (JSON) to ~18 tokens (Compact) for this example.

## Key Benefits
**Token Efficiency**: Reduces input payload size by 40-60% for repetitive data structures.

**Schema Enforcement**: Generates strict instructions for the LLM, reducing hallucinations.

**Dynamic**: Works with any Python object or dictionary; just define the schema at runtime.

**Recursive**: Supports deeply nested objects and lists via the |{...}| syntax.

## 📦 Installation

# Clone the repository
git clone [https://github.com/Ryujose/llm-compact-serializer.git](https://github.com/Ryujose/llm-compact-serializer.git)

# Install dependencies using Poetry
poetry install

## ⚡ Quick Start
This example demonstrates how to serialize a product list with nested destination data.

1. Define Your Schema
Tell the serializer what your data looks like. Order matters!

```
from llm_compact_serializer.domain.schema import CompactSchema, FieldConfig
from llm_compact_serializer.core.prompt_builder import PromptBuilder

# Define a nested schema for complex objects
destination_schema = CompactSchema(
    name="Destination",
    fields=[
        FieldConfig(source_name="address"),
        FieldConfig(source_name="phones", is_list=True) # Handles arrays [x, y]
    ]
)

# Define the root schema
product_schema = CompactSchema(
    name="Product",
    fields=[
        FieldConfig(source_name="name"),
        FieldConfig(source_name="price"),
        FieldConfig(source_name="destination", nested_schema=destination_schema)
    ]
)
```
2. Prepare Your Data
You can use Dictionaries, Pydantic models, or Dataclasses.

```
data = [
    {
        "name": "MacBook Pro", 
        "price": "1200€", 
        "destination": {
            "address": "Silicon Valley, CA",
            "phones": [5550199, 5550200]
        }
    }
]
```
3. Generate the Prompt
The PromptBuilder automatically generates the protocol instructions and injects your compressed data.
```
builder = PromptBuilder(product_schema)
base_prompt = "Analyze the following orders: [INPUT]"

final_prompt = builder.build(base_prompt, data, data_marker="[INPUT]")
print(final_prompt)
```
4. Output (What the LLM Sees)
```
[COMPACT_HIERARCHICAL_PROTOCOL]
[INSTRUCTIONS]
1. Interpret input strictly as a Recursive Compact Hierarchy.
2. Syntax: Complex objects enclosed in |{ }|, separated by comma.
3. Structure Mapping:
# 1 = Product (Root)
# 1a = name
# 1b = price
# 1.1 = destination
# 1.1a = address
# 1.1b* = phones
[END_PROTOCOL]

Analyze the following orders: |{MacBook Pro, 1200€, |{Silicon Valley, CA, [5550199, 5550200]}|}|
```

## 🏗 Architecture
The project follows Clean Architecture principles to ensure modularity and ease of testing.
```
llm-compact-serializer/
├── .github/
│   └── workflows/
│       ├── ci.yml              # CI/CD: Tests & Linting
│       └── publish.yml         # CD: Publish to PyPI
├── src/
│   └── llm_compact_serializer/
│       ├── __init__.py
│       ├── domain/             # Schema definitions (The "Rules")
│       │   ├── __init__.py
│       │   └── schema.py
│       └── core/               # The Engine (Generic Logic)
│           ├── __init__.py
│           └── serializer.py
├── tests/ 
│   └──/ # more tests
├── LICENSE                     # MIT
├── README.md
├── pyproject.toml              # Poetry Config
└── poetry.lock
```
### The Protocol Rules
**Object Wrapping**: All objects are wrapped in |{ ... }|.

**Separators**: Fields are separated by ,. Objects in a list are separated by ;.

**Recursion**: A field can contain another object, creating a nested structure: |{ val1, |{ val2 }| }|.

**Arrays**: Simple lists are wrapped in [...].

**Missing Data**: None or empty values are automatically replaced with - to maintain positional integrity.

**Sanitization**: Commas found within data values are automatically replaced (e.g., Doe, John -> Doe John) to prevent parsing errors.

## 🧪 Testing
We use pytest for comprehensive testing, covering unit logic and end-to-end integration.
```
# Run all tests
poetry run pytest

# Run with coverage report
poetry run pytest --cov=src
```
### Key Test Scenarios
test_serializer.py: Verifies primitive handling, recursive nesting logic, and sanitization (handling commas in data).

test_integration.py: Validates the full workflow (Schema -> Data -> Prompt) using complex real-world examples.

## 🤝 Contributing
1. Fork the repository.

2. Create a feature branch (git checkout -b feat/amazing-feature).

3. Commit your changes (git commit -m 'feat: Add amazing feature').

4. Push to the branch (git push origin feat/amazing-feature).

5. Open a Pull Request.

## 📄 License
Distributed under the MIT License. See [LICENSE](https://github.com/Ryujose/llm-compact-serializer/blob/main/LICENSE) for more information.
