Metadata-Version: 2.4
Name: therismos
Version: 0.8.0
Summary: A Python library for modeling queries, filters, expressions, grouping, and aggregations as object structures
Project-URL: Repository, https://gitlab.com/Kencho1/therismos
Project-URL: Homepage, https://gitlab.com/Kencho1/therismos
Project-URL: Issues, https://gitlab.com/Kencho1/therismos/-/issues
Author: Jesús Alonso Abad
License: MIT
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Software Development :: Libraries
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: tatsu~=5.8
Provides-Extra: dev
Requires-Dist: mypy~=1.5; extra == 'dev'
Requires-Dist: pytest-cov~=4.1; extra == 'dev'
Requires-Dist: pytest~=7.4; extra == 'dev'
Requires-Dist: ruff~=0.1; extra == 'dev'
Requires-Dist: tox~=4.0; extra == 'dev'
Provides-Extra: mongodb
Requires-Dist: pymongo~=4.0; extra == 'mongodb'
Provides-Extra: mongodb-async
Requires-Dist: motor~=3.0; extra == 'mongodb-async'
Provides-Extra: pandas
Requires-Dist: pandas; extra == 'pandas'
Requires-Dist: pandas-stubs; extra == 'pandas'
Provides-Extra: polars
Requires-Dist: polars; extra == 'polars'
Description-Content-Type: text/markdown

# Therismos

> **θερισμός**
>
> _Greek; noun_
>
> Harvest.

A Python library for modeling queries, filters, expressions, grouping, and aggregations as object structures.

## Features

- **Backend-agnostic modeling**: Build expressions, filters, sorting, and aggregations independent of any specific backend
- **Declarative DSL**: Natural Python syntax for building complex queries
- **Type safety**: Optional field type declarations with automatic casting
- **Immutable structures**: All nodes are immutable and thread-safe
- **Automatic normalization**: Compound expressions are automatically flattened
- **Powerful optimizer**: Detects contradictions, tautologies, and simplification opportunities in expressions and sorting
- **Grammar-based serialization**: Convert expressions to/from compact strings for URLs and APIs
- **Visitor pattern**: Extensible architecture for converting to any backend format
- **Optimization tracking**: Optional tracking of all optimization transformations
- **Sorting specifications**: Model sort criteria as objects with optimization and visitor support
- **Grouping and aggregation**: Model grouping and aggregation criteria as objects with optimization and visitor support
- **Expression templates**: Parameterized, persistable filter expressions with named placeholders and a transform pipeline DSL
- **Field pruning and projection**: Remove or project field-based constraints from an expression tree with polarity-aware semantics

## Installation

```bash
pip install therismos
```

Or using uv:

```bash
uv pip install therismos
```

## Expressions

Therismos provides a comprehensive expression system for modeling filters and conditions as object structures using an Abstract Syntax Tree (AST) approach.

### Quick Start

```python
from therismos import F, optimize

# Define fields
age = F("age", int)
name = F("name")
status = F("status")

# Build expressions using natural Python syntax
expr = (age > 18) & (name == "Alice") | (status == "admin")

# Optimize the expression
optimized, records = optimize(expr)

# More complex example: detect contradictions
contradiction = (age < 30) & (age > 40)
result, _ = optimize(contradiction)
# result is FALSE

# Aggregate OR equality chains
multi_status = (status == "active") | (status == "pending") | (status == "completed")
result, _ = optimize(multi_status)
# result is: status IN ("active", "pending", "completed")
```

### Expression Types

#### Atomic Expressions

- **Comparisons**: `==`, `!=`, `<`, `<=`, `>`, `>=`
- **Range**: `field.between(lower, upper)` — half-open range `lower <= field < upper`
- **Regex matching**: `field.matches(pattern, flags=None)`
- **Membership**: `field.is_in(*values)` or `field.is_one_of(iterable)`
- **Null checking**: `field.is_null()`, `field.is_not_null()`
- **Constants**: `TRUE`, `FALSE`

#### Compound Expressions

- **AND**: `expr1 & expr2` or `AllExpr(expr1, expr2, ...)`
- **OR**: `expr1 | expr2` or `AnyExpr(expr1, expr2, ...)`
- **NOT**: `~expr` or `NotExpr(expr)`

### Type Casting

Fields can declare expected types for automatic value casting:

```python
age = F("age", int)
price = F("price", float)

# Values are automatically cast
expr = age == "42"  # value is stored as string
casted = expr.casted_value()  # returns integer 42
```

Custom cast functions are also supported:

```python
def normalize_email(value):
    return str(value).strip().lower()

email = F("email", normalize_email)
```

### Optimization

The optimizer applies various rules to simplify expressions and detect logical issues.

#### Basic Examples

```python
from therismos import optimize, F, TRUE, FALSE, AllExpr, AnyExpr

age = F("age")

# Identity elimination
expr = AllExpr(age > 18, TRUE, age < 65)
result, _ = optimize(expr)
# result is: AllExpr(age > 18, age < 65)

# Contradiction detection
expr = (age == 25) & (age != 25)
result, _ = optimize(expr)
# result is: FALSE

# Tautology detection
expr = (age < 30) | (age >= 30)
result, _ = optimize(expr)
# result is: TRUE

# NOT simplification (De Morgan's laws)
expr = ~((age > 18) & (name == "Alice"))
result, _ = optimize(expr)
# result is: (age <= 18) OR (name != "Alice")
```

#### Optimization Rules Reference

The optimizer implements the following transformation rules:

##### Atomic Expression Simplifications

| Rule | Before | After |
|------|--------|-------|
| Empty IN to FALSE | `f IN ()` | `FALSE` |
| Single-value IN to Eq | `f IN (v)` | `f == v` |
| Empty Between range | `f.between(a, b)` where `a >= b` | `FALSE` |

##### NOT Expression Simplifications

| Rule | Before | After |
|------|--------|-------|
| NOT of TRUE | `NOT(TRUE)` | `FALSE` |
| NOT of FALSE | `NOT(FALSE)` | `TRUE` |
| Double negation | `NOT(NOT(x))` | `x` |
| De Morgan's law (AND) | `NOT(a AND b)` | `NOT(a) OR NOT(b)` |
| De Morgan's law (OR) | `NOT(a OR b)` | `NOT(a) AND NOT(b)` |

##### AND Expression Simplifications

| Rule | Before | After |
|------|--------|-------|
| Empty AND | `AND()` | `TRUE` |
| Single operand | `AND(x)` | `x` |
| FALSE propagation | `AND(..., FALSE, ...)` | `FALSE` |
| TRUE elimination | `AND(..., TRUE, ...)` | `AND(...)` (TRUE removed) |
| All TRUE | `AND(TRUE, TRUE, ...)` | `TRUE` |
| Eq/Eq same value | `(f == v) AND (f == v)` | `f == v` |
| Eq/Eq different values | `(f == v1) AND (f == v2)` | `FALSE` |
| Eq/In intersection (member) | `(f == v) AND (f IN (v, ...))` | `f == v` |
| Eq/In intersection (non-member) | `(f == v) AND (f IN (...))` | `FALSE` (v not in set) |
| In/In intersection (empty) | `(f IN (v1, v2)) AND (f IN (v3, v4))` | `FALSE` (no overlap) |
| In/In intersection (single) | `(f IN (v1, v2)) AND (f IN (v2, v3))` | `f == v2` |
| In/In intersection (multiple) | `(f IN (v1, v2, v3)) AND (f IN (v2, v3, v4))` | `f IN (v2, v3)` |

##### OR Expression Simplifications

| Rule | Before | After |
|------|--------|-------|
| Empty OR | `OR()` | `FALSE` |
| Single operand | `OR(x)` | `x` |
| TRUE propagation | `OR(..., TRUE, ...)` | `TRUE` |
| FALSE elimination | `OR(..., FALSE, ...)` | `OR(...)` (FALSE removed) |
| All FALSE | `OR(FALSE, FALSE, ...)` | `FALSE` |
| Eq/Eq union | `(f == v1) OR (f == v2)` | `f IN (v1, v2)` |
| Eq/In union | `(f == v) OR (f IN (v2, v3))` | `f IN (v, v2, v3)` |
| In/In union | `(f IN (v1, v2)) OR (f IN (v3, v4))` | `f IN (v1, v2, v3, v4)` |

##### Contradiction Detection (AND)

| Pattern | Result |
|---------|--------|
| `(f == v) AND (f != v)` | `FALSE` |
| `f.is_null() AND f.is_not_null()` | `FALSE` |
| `(f < a) AND (f > b)` where `b >= a` | `FALSE` |
| `(f <= a) AND (f > a)` | `FALSE` |
| `(f >= b) AND (f < b)` | `FALSE` |
| `f.between(a, b) AND f.between(c, d)` where `max(a,c) >= min(b,d)` | `FALSE` |
| `f.between(a, b) AND f.between(c, d)` where ranges overlap | `f.between(max(a,c), min(b,d))` |
| `f.between(a, b) AND (f > c)` where `c >= b` | `FALSE` |
| `f.between(a, b) AND (f < c)` where `c <= a` | `FALSE` |

##### Between Range Union (OR)

| Pattern | Result |
|---------|--------|
| `f.between(a, b) OR f.between(c, d)` — overlapping (`min(b,d) > max(a,c)`) | `f.between(min(a,c), max(b,d))` |
| `f.between(a, b) OR f.between(b, d)` — adjacent | `f.between(a, d)` |

##### Tautology Detection (OR)

| Pattern | Result |
|---------|--------|
| `(f == v) OR (f != v)` | `TRUE` |
| `f.is_null() OR f.is_not_null()` | `TRUE` |
| `(f < v) OR (f >= v)` | `TRUE` |
| `(f <= v) OR (f > v)` | `TRUE` |

#### Complex Real-World Example: Detecting Accidental Contradictions

The optimizer is particularly valuable for catching accidentally contradictory conditions in complex business logic. Here's a realistic scenario where multiple nested requirements create an impossible condition:

```python
from therismos import F, optimize, FALSE

# Define fields
user_age = F("age", int)
user_role = F("role")
user_status = F("status")
account_tier = F("account_tier")
dept = F("department")
experience = F("experience_years", int)
available = F("available", bool)

# Complex filter built incrementally by different team members
# Each level seemed reasonable in isolation, but together they create a contradiction
complex_filter = (
    (
        # Level 1: Nested OR conditions for base eligibility
        (
            (
                # Premium account holders
                (account_tier == "premium") &
                (
                    (user_role == "developer") |
                    (user_role == "designer")
                )
            ) |
            (
                # OR enterprise users with experience
                (account_tier == "enterprise") &
                (experience >= 5) &
                (dept.is_in("engineering", "design"))
            )
        ) &
        # Level 2: Status and department requirements with nesting
        (
            (
                (user_status == "active") &
                (
                    # Nested department-specific conditions
                    (
                        (dept == "engineering") &
                        (experience >= 2)
                    ) |
                    (
                        (dept == "design") &
                        (user_role.is_in("designer", "lead_designer"))
                    )
                )
            ) |
            # OR admin override
            (user_role == "admin")
        ) &
        # Level 3: First age requirement
        (user_age >= 25) &
        # Level 4: Second age requirement nested with other conditions
        (
            (user_age <= 50) &
            (
                # More nesting for additional validation
                (account_tier.is_in("premium", "enterprise", "trial")) |
                (user_role == "admin")
            )
        )
    ) & (
        # Level 5: Someone later added "additional validation"
        # without realizing it contradicts the previous age requirements!
        (user_age < 25) &  # Must be under 25
        (user_age > 50)    # AND must be over 50 (impossible!)
    ) &
    (available == True)
)

# The contradiction occurs because:
# - Earlier levels require: 25 <= age <= 50
# - Final level requires: age < 25 AND age > 50
# - These conditions cannot both be true!

result, records = optimize(complex_filter)

print(f"Optimized result: {result}")
# Output: FalseExpr()

print(f"Is FALSE: {result is FALSE}")
# Output: True

print(f"Optimization steps that revealed the contradiction:")
for i, record in enumerate(records, 1):
    print(f"Step {i}: {record.reason}")
    if "Contradiction" in record.reason:
        print(f"  *** This step detected the contradiction! ***")

# Example output:
# Step 1: OR equality chain aggregation to IN
# Step 2: Optimize children in AND
# Step 3: Optimize children in OR
# Step 4: Optimize children in AND
# Step 5: Contradiction detected in AND
#   *** This step detected the contradiction! ***

# By examining the 'before' expression in the contradiction record,
# you can identify exactly which requirements conflict with each other
# and trace back through your business logic to find the source.
```

The optimizer's tracking feature is invaluable for debugging complex business rules, especially when:
- Multiple developers contribute conditions to the same filter over time
- Requirements evolve and accidentally introduce conflicts
- Combining filters from different parts of the application
- Migrating or refactoring legacy filtering logic
- Building user-facing query builders where users can create invalid combinations

#### Optimization Tracking

Track optimization changes:

```python
result, records = optimize(expr)
for record in records:
    print(f"Applied: {record.reason}")
    print(f"Before: {record.before}")
    print(f"After: {record.after}")
```

You can also use a collecting parameter to accumulate records across multiple optimizations:

```python
my_records = []
result1, _ = optimize(expr1, my_records)
result2, _ = optimize(expr2, my_records)
# my_records now contains all optimization steps from both calls
```

### Field Pruning

`prune_fields` removes or projects field-based constraints from an expression
tree. It is useful when a stored filter contains constraints on fields that are
unavailable or irrelevant in a particular execution context, and you need to
decide how to handle the missing constraints conservatively or permissively.

```python
from therismos import F, prune_fields, FieldSelection, PruneMode

age = F("age", int)
status = F("status")
dept = F("department")

expr = (age > 18) & (status == "active") & (dept == "engineering")

# PRUNE mode (default): remove listed fields — age constraint dropped
# RESTRICT mode (default): dropping a constraint excludes non-matching records
result = prune_fields(expr, frozenset({"age"}))
# result: AllExpr(status == "active", dept == "engineering")

# RELAX mode: dropping a constraint lets records pass through
result = prune_fields(expr, frozenset({"age"}), mode=PruneMode.RELAX)
# result: AllExpr(status == "active", dept == "engineering")

# Where RESTRICT vs RELAX differ — single constraint
only_age = age > 18
result_restrict = prune_fields(only_age, frozenset({"age"}))
# result: FALSE  (no constraint left → exclude)

result_relax = prune_fields(only_age, frozenset({"age"}), mode=PruneMode.RELAX)
# result: TRUE   (no constraint left → include)

# KEEP mode: keep only listed fields, prune everything else
result = prune_fields(expr, frozenset({"status"}), selection=FieldSelection.KEEP)
# result: status == "active"
```

#### Polarity-aware substitution under NOT

The substitution correctly flips semantics when a pruned leaf appears inside a
`NOT` expression:

```python
from therismos import F, prune_fields, PruneMode

age = F("age", int)
status = F("status")

expr = ~(age > 18) & (status == "active")

# RESTRICT: age is pruned to FALSE at positive polarity
# NOT(FALSE) → TRUE → TRUE & (status == "active") → status == "active"
result = prune_fields(expr, frozenset({"age"}))
# result: status == "active"
```

### Expression Evaluation

Expressions can be evaluated against actual data to determine if the data satisfies the filter criteria. This is useful for:

- In-memory filtering when a database query is not needed
- Testing and validating filter logic
- Client-side filtering before sending data to a backend
- Data validation and access control checks

The `eval()` method is designed to handle both single-valued and multi-valued (list-like) fields. To support this, all input data must be wrapped with the `unwind_data` utility function. This function flattens nested data structures into a consistent format that the evaluation engine can process.

#### Basic Evaluation

The `eval()` method evaluates an expression against a dictionary of field values, which must be passed to `unwind_data`.

```python
from therismos import F, unwind_data

age = F("age")
status = F("status")

# Build an expression
expr = (age > 18) & (status == "active")

# Evaluate against data by wrapping it with unwind_data
data = {"age": 25, "status": "active"}
result = expr.evaluate(unwind_data(data))  # Returns True

data = {"age": 15, "status": "active"}
result = expr.evaluate(unwind_data(data))  # Returns False
```

#### Evaluation with Type Casting

When fields have declared types, values are automatically cast during evaluation:

```python
age = F("age", int)
expr = age >= 18

# String values are automatically cast to int
result = expr.evaluate(unwind_data({"age": "25"}))  # Returns True

# This will raise TypeError or ValueError if casting fails
try:
    expr.evaluate(unwind_data({"age": "not_a_number"}))
except (TypeError, ValueError):
    print("Invalid age value")
```

#### Multi-Valued Field Evaluation

The evaluation engine seamlessly handles fields that contain lists or nested lists of values.

- **Comparison Operators (`==`, `>`, `<`, etc.)** use **"any" semantics**: the condition is `True` if **any** value in the list meets the criteria.
- **Inequality (`!=`)** uses **"none" semantics**: the condition is `True` if **no** value in the list meets the criteria.

```python
# 'tags' is a multi-valued field
tags = F("tags")
scores = F("scores", int)

# Equality: True if "python" is ANY of the tags
expr_eq = tags == "python"
data_eq = {"tags": ["java", "python", "rust"]}
assert expr_eq.evaluate(unwind_data(data_eq)) is True  # True because "python" is present

# Inequality: True if "python" is NONE of the tags
expr_ne = tags != "python"
data_ne_pass = {"tags": ["java", "rust"]}
data_ne_fail = {"tags": ["java", "python"]}
assert expr_ne.evaluate(unwind_data(data_ne_pass)) is True  # True because "python" is absent
assert expr_ne.evaluate(unwind_data(data_ne_fail)) is False # False because "python" is present

# Greater Than: True if ANY score is > 80
expr_gt = scores > 80
data_gt = {"scores": [60, 75, 90]}
assert expr_gt.evaluate(unwind_data(data_gt)) is True # True because 90 > 80

# The data can even be nested
data_nested = {"scores": [[60, 75], [90, 40]]}
assert expr_gt.evaluate(unwind_data(data_nested)) is True # Still True, as 90 > 80
```

#### Evaluating Membership and Regex

`is_in` and `matches` also work with multi-valued fields, returning `True` if any value in the field's list satisfies the condition.

```python
import re

# IN expressions
status = F("status")
expr = status.is_in("active", "pending", "approved")
result = expr.evaluate(unwind_data({"status": "active"}))  # Returns True

# Regex matching on a multi-valued field
log_messages = F("logs")
expr = log_messages.matches(r"ERROR:", re.IGNORECASE)
data = {"logs": ["INFO: User logged in", "ERROR: Connection failed"]}
result = expr.evaluate(unwind_data(data))  # Returns True because one message matches

# Null checking
phone = F("phone")
expr = phone.is_null()
result = expr.evaluate(unwind_data({"phone": None}))  # Returns True
```

#### Complex Evaluation Examples

Compound expressions evaluate all nested conditions, now with support for multi-valued fields.

```python
age = F("age", int)
country = F("country")
verified = F("verified")
subscription = F("subscription")

# Complex eligibility check
expr = (
    (age >= 18) &
    (country.is_one_of(["US", "UK", "CA"])) &
    ((verified == True) | (subscription.is_in("premium", "enterprise")))
)

# Adult in allowed country with verification
result = expr.evaluate(unwind_data({
    "age": 25,
    "country": "US",
    "verified": True,
    "subscription": "free"
}))  # Returns True

# Adult in allowed country with premium subscription (unverified)
result = expr.evaluate(unwind_data({
    "age": 30,
    "country": "UK",
    "verified": False,
    "subscription": "premium"
}))  # Returns True

# Minor (fails age requirement)
result = expr.evaluate(unwind_data({
    "age": 16,
    "country": "US",
    "verified": True,
    "subscription": "premium"
}))  # Returns False
```

#### Evaluation with Optimized Expressions

You can optimize expressions before evaluation for better performance or to catch logical issues:

```python
age = F("age", int)
status = F("status")

# Build a complex expression
expr = (
    ((age > 18) | (age > 25)) &  # Redundant condition
    (status == "active") &
    ((age < 30) | (age >= 30))   # Tautology
)

# Optimize first
optimized, _ = optimize(expr)
# optimized is simplified to: (age > 18) AND (status == "active")

# Then evaluate the optimized expression
result = optimized.evaluate(unwind_data({"age": 25, "status": "active"}))  # Returns True
```

#### Error Handling

Evaluation raises exceptions for invalid data:

```python
age = F("age")
expr = age > 18

# Missing field raises KeyError
try:
    expr.evaluate(unwind_data({"name": "Alice"}))  # age field is missing
except KeyError:
    print("Required field 'age' not found")

# Invalid type casting raises TypeError or ValueError
age_typed = F("age", int)
expr = age_typed > 18
try:
    expr.evaluate(unwind_data({"age": "not_a_number"}))
except (TypeError, ValueError):
    print("Cannot cast value to required type")
```

### Converting to other formats

Therismos uses the visitor pattern to enable extensible conversions of expressions to any format. You can implement custom visitors or use the built-in ones.

#### Custom Visitors

Implement custom visitors to convert expressions to any format:

```python
from therismos import ExprVisitor

class SQLVisitor:
    def visit_eq(self, expr):
        return f"{expr.field.name} = ?"

    def visit_all(self, expr):
        parts = [e.accept(self) for e in expr.exprs]
        return " AND ".join(parts)

    # ... implement other visit methods

visitor = SQLVisitor()
sql = expr.accept(visitor)
```

#### Built-in Visitors

Therismos provides several built-in visitors for common use cases:

##### StringVisitor

Converts expressions to human-readable string representation:

```python
from therismos import F, StringVisitor

age = F("age")
name = F("name")
expr = (age > 18) & (name == "Alice")

visitor = StringVisitor()
result = expr.accept(visitor)
# Output: "(age > 18 AND name = 'Alice')"
```

##### CountVisitor

Counts the number of nodes in an expression tree:

```python
from therismos import F, CountVisitor

age = F("age")
name = F("name")
expr = (age > 18) & (name == "Alice")

visitor = CountVisitor()
count = expr.accept(visitor)
# Output: 3 (1 AllExpr + 2 atomic expressions)
```

##### DictVisitor

Converts expressions to dictionary representation for serialization:

```python
from therismos import F, DictVisitor

age = F("age")
expr = age > 18

visitor = DictVisitor()
result = expr.accept(visitor)
# Output: {"type": "gt", "field": "age", "value": 18}
```

For compound expressions, the dictionary is nested:

```python
age = F("age")
name = F("name")
expr = (age > 18) & (name == "Alice")

visitor = DictVisitor()
result = expr.accept(visitor)
# Output: {
#     "type": "and",
#     "exprs": [
#         {"type": "gt", "field": "age", "value": 18},
#         {"type": "eq", "field": "name", "value": "Alice"}
#     ]
# }
```

##### FieldGathererVisitor

Collects all unique field names used in an expression tree:

```python
from therismos import F, FieldGathererVisitor

age = F("age")
name = F("name")
status = F("status")
expr = (age > 18) & (name == "Alice") | (status == "active")

visitor = FieldGathererVisitor()
expr.accept(visitor)
field_names = visitor.field_names
# Output: {"age", "name", "status"}
```

This is useful for:
- Analyzing which fields are used in complex filters
- Validating that all referenced fields exist in your schema
- Generating documentation or metadata about queries
- Determining required permissions for a query

#### Backend Converters

##### MongoVisitor

The `MongoVisitor` converts therismos expressions to MongoDB query filters compatible with PyMongo and Motor.

**Installation:**

```bash
# For synchronous PyMongo
uv pip install therismos[mongodb]

# For asynchronous Motor
uv pip install therismos[mongodb-async]
```

**Basic Usage:**

```python
from therismos import F, optimize
from therismos.expr.visitors.mongo import MongoVisitor

age = F("age")
status = F("status")
country = F("country")

# Build and optimize expression
expr = (age >= 21) & (status == "active") & (country.is_in("US", "UK", "CA"))
optimized, _ = optimize(expr)

# Convert to MongoDB filter
visitor = MongoVisitor()
mongo_filter = optimized.accept(visitor)

# Result: {
#     "age": {"$gte": 21},
#     "status": "active",
#     "country": {"$in": ["US", "UK", "CA"]}
# }
```

**Using with PyMongo:**

```python
from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db = client["mydb"]
collection = db["users"]

# Use the generated filter
results = collection.find(mongo_filter)
for doc in results:
    print(doc)
```

**Using with Motor (async):**

```python
import asyncio
from motor.motor_asyncio import AsyncIOMotorClient

async def find_users():
    client = AsyncIOMotorClient("mongodb://localhost:27017/")
    db = client["mydb"]
    collection = db["users"]

    # Use the generated filter
    cursor = collection.find(mongo_filter)
    results = await cursor.to_list(length=100)
    return results

asyncio.run(find_users())
```

**Advanced Features:**

The MongoVisitor handles all therismos expression types:

```python
import re
from therismos import F, TRUE, FALSE
from therismos.expr.visitors.mongo import MongoVisitor

email = F("email")
age = F("age")
name = F("name")
status = F("status")

visitor = MongoVisitor()

# Regex matching (with case-insensitive flag)
expr = email.matches(r".*@example\.com$", re.IGNORECASE)
mongo_filter = expr.accept(visitor)
# Result: {"email": {"$regex": ".*@example\\.com$", "$options": "i"}}

# Range queries
expr = (age >= 18) & (age <= 65)
mongo_filter = expr.accept(visitor)
# Result: {"age": {"$gte": 18, "$lte": 65}} (optimized)

# Complex OR conditions
expr = (status == "active") | (status == "pending") | (status == "approved")
optimized_expr, _ = optimize(expr)  # Converts to IN
mongo_filter = optimized_expr.accept(visitor)
# Result: {"status": {"$in": ["active", "pending", "approved"]}}

# Null checking
expr = name.is_not_null()
mongo_filter = expr.accept(visitor)
# Result: {"name": {"$ne": null}}

# NOT expressions
expr = ~(age < 18)
mongo_filter = expr.accept(visitor)
# Result: {"$nor": [{"age": {"$lt": 18}}]}

# Constants
true_filter = TRUE.accept(visitor)   # Result: {}
false_filter = FALSE.accept(visitor)  # Result: {"$expr": false}
```

**Optimization Options:**

```python
# By default, simple AND expressions are optimized by merging fields
visitor = MongoVisitor(optimize_simple_and=True)
expr = (age > 18) & (name == "Alice")
mongo_filter = expr.accept(visitor)
# Result: {"age": {"$gt": 18}, "name": "Alice"}

# Disable optimization to always use $and
visitor = MongoVisitor(optimize_simple_and=False)
mongo_filter = expr.accept(visitor)
# Result: {"$and": [{"age": {"$gt": 18}}, {"name": "Alice"}]}
```

**Type Casting:**

The MongoVisitor respects field type declarations and automatically casts values:

```python
age = F("age", int)
expr = age.is_in(18, 21, 25)  # Values will be cast to int

visitor = MongoVisitor()
mongo_filter = expr.accept(visitor)
# Result: {"age": {"$in": [18, 21, 25]}}
```

### Expression Serialization

Therismos provides grammar-based serialization to convert expressions to/from compact string representations. This is particularly useful for URL query strings, API parameters, and storing filters as text.

#### Core Concepts

**Serialization Basics**

The `Serializer` class converts expressions to compact strings:

```python
from therismos import F, Serializer, Eq, AllExpr, Gt

serializer = Serializer()

# Serialize simple expressions
expr = Eq(F("age"), 18)
text = serializer.serialize(expr)
# Result: "age==18"

# Compound expressions
expr = AllExpr(Eq(F("age"), 18), Gt(F("score"), 75))
text = serializer.serialize(expr)
# Result: "(age==18;score>75)"

# Deserialize strings back to expressions
expr = serializer.deserialize("age==18")
# Result: Eq(field=Field(name='age', type_=None), value=18)
```

**Grammar Reference**

The serializer uses a compact grammar optimized for URL usage:

| Python Operator | Grammar Syntax | Example |
|----------------|----------------|---------|
| `&` (AND) | `;` | `age>18;status=="active"` |
| `\|` (OR) | `,` | `status=="active",status=="pending"` |
| `~` (NOT) | `!` | `!(age<18)` |
| `==` | `==` | `age==18` |
| `!=` | `!=` | `status!="inactive"` |
| `<` | `<` | `age<65` |
| `<=` | `<=` | `age<=65` |
| `>` | `>` | `age>18` |
| `>=` | `>=` | `age>=18` |
| `.is_in()` | `=in=` | `status=in=("active","pending")` |
| `.matches()` | `~regex` | `email~regex(".*@example\\.com")` |
| `.is_null()` | `==null` | `deleted_at==null` |
| `.is_not_null()` | `!=null` | `created_at!=null` |
| `TRUE` | `true()` | `true()` |
| `FALSE` | `false()` | `false()` |

**Precedence:** `!` (NOT) > `;` (AND) > `,` (OR)

#### Serialization Features

**Basic Usage**

For use in URL query strings, enable URL encoding:

```python
# Create a serializer with URL encoding
serializer = Serializer(url_encode=True)

# Serialize with URL encoding
expr = Eq(F("name"), "Alice Smith")
text = serializer.serialize(expr)
# Result: URL-encoded string

# Deserialize automatically decodes
expr = serializer.deserialize(text)
# Result: Original expression
```

**Type Handling**

Control type annotation output in serialization:

```python
age = F("age", int)
name = F("name", str)

# Without type annotations (default)
serializer = Serializer()
text = serializer.serialize(Eq(age, 18))
# Result: "age==18"

# With all type annotations
serializer = Serializer(include_all_types=True)
text = serializer.serialize(Eq(age, 18))
# Result: "age{int}==18"
```

Register custom types for serialization using `register_custom_type()`:

```python
def uppercase_transform(x):
    return str(x).upper()

serializer = Serializer()
serializer.register_custom_type(uppercase_transform, "upper")

# Use the custom type
field = F("code", uppercase_transform)
expr = Eq(field, "abc")

# Serialize with type annotation
serializer_typed = Serializer(include_all_types=True)
serializer_typed.register_custom_type(uppercase_transform, "upper")
text = serializer_typed.serialize(expr)
# Result: "code{upper}==\"ABC\"" (value is transformed)
```

Values are automatically cast during deserialization when type annotations are present:

```python
import uuid
from therismos import Serializer

serializer = Serializer()
serializer.register_custom_type(uuid.UUID, 'uuid.UUID')

# Deserialize with type annotation
expr = serializer.deserialize('user_id{uuid.UUID}=="550e8400-e29b-41d4-a716-446655440000"')

# Value is automatically cast to UUID
assert isinstance(expr.value, uuid.UUID)
assert expr.value == uuid.UUID("550e8400-e29b-41d4-a716-446655440000")
```

Use the `implicit_field_types` parameter to define type mappings for field names, avoiding repeating type annotations:

```python
import uuid
from decimal import Decimal
from therismos import Serializer

# Define implicit field type mappings
implicit_field_types = {
    "user_id": uuid.UUID,
    "product_id": uuid.UUID,
    "price": Decimal,
}

serializer = Serializer(implicit_field_types=implicit_field_types)
serializer.register_custom_type(uuid.UUID, 'uuid.UUID')
serializer.register_custom_type(Decimal, 'Decimal')

# No type annotation needed - uses implicit mapping
expr = serializer.deserialize('user_id=="550e8400-e29b-41d4-a716-446655440000"')
assert expr.field.type_ is uuid.UUID
assert isinstance(expr.value, uuid.UUID)

# You can also register field types dynamically
serializer.register_field_type("account_id", uuid.UUID)

# Explicit type annotations always override implicit mappings
expr = serializer.deserialize('price{int}=="100"')
assert expr.field.type_ is int  # Not Decimal
```

**Advanced Features**

Field names with dots for nested references:

```python
expr = Eq(F("user.profile.age"), 25)
text = serializer.serialize(expr)
# Result: "user.profile.age==25"
```

Complete roundtrip example:

```python
from therismos import F, Serializer, optimize

# Build expression
expr = (F("age") >= 21) & (F("status").is_in("active", "pending"))

# Optimize and serialize for URL
optimized, _ = optimize(expr)
serializer = Serializer(url_encode=True)
query_param = serializer.serialize(optimized)
# Use in URL: /api/users?filter={query_param}

# Later, deserialize from the URL parameter
received_expr = serializer.deserialize(query_param)
```

#### Value Reference

The serializer supports various value types:

```python
serializer = Serializer()

# Strings (double-quoted with escapes)
serializer.serialize(Eq(F("name"), "Alice"))
# Result: "name==\"Alice\""

# Numbers (integers and floats)
serializer.serialize(Eq(F("age"), 25))
# Result: "age==25"

# Booleans
serializer.serialize(Eq(F("active"), True))
# Result: "active==true"

# Null
serializer.serialize(Eq(F("value"), None))
# Result: "value==null"

# Identifiers (unquoted - interpreted as strings)
expr = serializer.deserialize("status==active")
# value is the string "active"
```

## Sorting

Therismos provides a sorting system for modeling sort criteria as object structures, similar to how expressions model filters.

### Quick Start

```python
from therismos.sorting import SortSpec, SortCriterion, SortOrder

# Create sort criteria using plain strings
spec = SortSpec([
    SortCriterion("age", SortOrder.DESCENDING),
    SortCriterion("name", SortOrder.ASCENDING),
])

# Convert to string
from therismos.sorting.visitors import StringVisitor
visitor = StringVisitor()
print(spec.accept(visitor))
# Output: "age DESC, name ASC"
```

### Sort Orders

Three sort orders are available:

- **`SortOrder.ASCENDING`** (value: 1): Sort in ascending order
- **`SortOrder.DESCENDING`** (value: -1): Sort in descending order
- **`SortOrder.NONE`** (value: 0): No sorting (typically filtered out during optimization)

### Creating Sort Specifications

```python
from therismos.sorting import SortSpec, SortCriterion, SortOrder

# Individual criterion
criterion = SortCriterion("age", SortOrder.ASCENDING)

# Full specification
spec = SortSpec([
    SortCriterion("created_at", SortOrder.DESCENDING),
    SortCriterion("priority", SortOrder.ASCENDING),
    SortCriterion("name", SortOrder.ASCENDING),
])

# SortSpec is a list-like collection
spec.append(SortCriterion("id", SortOrder.ASCENDING))
print(len(spec))  # 4
```

### Optimization

The sorting optimizer removes redundant and meaningless criteria:

```python
from therismos.sorting import SortSpec, SortCriterion, SortOrder
from therismos.sorting.optimizer import optimize

spec = SortSpec([
    SortCriterion("age", SortOrder.ASCENDING),
    SortCriterion("name", SortOrder.NONE),      # Will be removed
    SortCriterion("age", SortOrder.DESCENDING), # Overrides first "age"
])

optimized, records = optimize(spec)
# Result: SortSpec([SortCriterion("age", SortOrder.DESCENDING)])
# Only one criterion remains - the last occurrence of "age"

# Check what was optimized
for record in records:
    print(record.reason)
```

Optimization rules:

1. **Remove NONE orders**: Criteria with `SortOrder.NONE` are removed
2. **Remove redundant criteria**: When a field appears multiple times, only the last occurrence is kept

### Converting to Other Formats

#### Built-in Visitors

```python
from therismos.sorting import SortSpec, SortCriterion, SortOrder
from therismos.sorting.visitors import StringVisitor, DictVisitor, FieldGathererVisitor

spec = SortSpec([
    SortCriterion("age", SortOrder.DESCENDING),
    SortCriterion("name", SortOrder.ASCENDING),
])

# String representation
string_visitor = StringVisitor()
print(spec.accept(string_visitor))
# Output: "age DESC, name ASC"

# Dictionary representation
dict_visitor = DictVisitor()
result = spec.accept(dict_visitor)
# Result: [{"field": "age", "order": "DESC"}, {"field": "name", "order": "ASC"}]

# Collect field names
field_visitor = FieldGathererVisitor()
spec.accept(field_visitor)
print(field_visitor.field_names)
# Output: {"age", "name"}
```

#### MongoDB Sorting

```python
from therismos.sorting import SortSpec, SortCriterion, SortOrder
from therismos.sorting.visitors.mongo import MongoVisitor

spec = SortSpec([
    SortCriterion("created_at", SortOrder.DESCENDING),
    SortCriterion("name", SortOrder.ASCENDING),
])

visitor = MongoVisitor()
mongo_sort = spec.accept(visitor)
# Result: {"created_at": -1, "name": 1}

# Use with PyMongo
# cursor = collection.find().sort(list(mongo_sort.items()))

# Use with Motor (async)
# cursor = await collection.find().sort(list(mongo_sort.items())).to_list(length=100)
```

### Serialization

Convert sort specifications to/from compact string format for URLs and APIs:

```python
from therismos.sorting import Serializer

# Create serializer
serializer = Serializer()

# Serialize to string
spec = SortSpec([
    SortCriterion("age", SortOrder.ASCENDING),
    SortCriterion("created_at", SortOrder.DESCENDING),
    SortCriterion("priority", SortOrder.ASCENDING),
])

text = serializer.serialize(spec)
# Result: "age,-created_at,priority"

# Deserialize from string
restored = serializer.deserialize("name,-score,+priority")
# Result: SortSpec with name ASC, score DESC, priority ASC

# Format rules:
# - Comma-separated list
# - No prefix or + prefix = ascending
# - Minus prefix (-) = descending
```

### Custom Visitors

Create custom visitors to convert sort specifications to any format:

```python
from therismos.sorting import SortCriterion, SortSpec

class SQLVisitor:
    """Convert sort spec to SQL ORDER BY clause."""

    def visit_sort_criterion(self, criterion: SortCriterion) -> str:
        order_str = "ASC" if criterion.order == SortOrder.ASCENDING else "DESC"
        return f"{criterion.field} {order_str}"

    def visit_sort_spec(self, spec: SortSpec) -> str:
        if not spec:
            return ""
        parts = [criterion.accept(self) for criterion in spec]
        return "ORDER BY " + ", ".join(parts)

# Usage
visitor = SQLVisitor()
result = spec.accept(visitor)
# Result: "ORDER BY created_at DESC, name ASC"
```

## Grouping and Aggregation

Therismos provides a grouping and aggregation system for modeling SQL-like GROUP BY operations with aggregation functions as object structures.

### Quick Start

```python
from therismos.grouping import GroupSpec, Aggregation, AggregationFunction

# Create a grouping specification
spec = GroupSpec(
    group_by=["category", "region"],
    aggregations=[
        Aggregation("total", AggregationFunction.COUNT),
        Aggregation("min_price", AggregationFunction.MIN, "price"),
        Aggregation("avg_price", AggregationFunction.AVERAGE, "price"),
    ],
)

# Convert to string
from therismos.grouping.visitors import StringVisitor
visitor = StringVisitor()
print(spec.accept(visitor))
# Output: ("category,region", "total:count,min_price:min:price,avg_price:average:price")
```

### Aggregation Functions

Therismos supports a comprehensive set of aggregation functions:

- **`COUNT`**: Count of items in each group (field is optional and silently ignored if provided; recommended usage omits it)
- **`SUM`**: Sum of values
- **`MIN`**: Minimum value
- **`MAX`**: Maximum value
- **`AVERAGE`**: Average (mean) value
- **`STDDEV`**: Standard deviation
- **`MEDIAN`**: Median value
- **`Q1`**: First quartile (25th percentile)
- **`Q3`**: Third quartile (75th percentile)
- **`P01`, `P05`, `P10`**: 1st, 5th, and 10th percentiles
- **`P90`, `P95`, `P99`**: 90th, 95th, and 99th percentiles

All aggregation functions except COUNT require a field to aggregate. For COUNT, any provided field is silently ignored; the recommended form omits it entirely.

### Creating Grouping Specifications

```python
from therismos.grouping import GroupSpec, Aggregation, AggregationFunction

# Simple grouping with count
spec = GroupSpec(
    group_by=["status"],
    aggregations=[Aggregation("count", AggregationFunction.COUNT)],
)

# Multiple grouping fields
spec = GroupSpec(
    group_by=["category", "region", "status"],
    aggregations=[
        Aggregation("total", AggregationFunction.COUNT),
        Aggregation("min_price", AggregationFunction.MIN, "price"),
        Aggregation("max_price", AggregationFunction.MAX, "price"),
        Aggregation("avg_revenue", AggregationFunction.AVERAGE, "revenue"),
    ],
)

# Percentile aggregations
spec = GroupSpec(
    group_by=["service"],
    aggregations=[
        Aggregation("p95_latency", AggregationFunction.P95, "latency"),
        Aggregation("p99_latency", AggregationFunction.P99, "latency"),
        Aggregation("median_latency", AggregationFunction.MEDIAN, "latency"),
    ],
)

# Global aggregation (no grouping)
spec = GroupSpec(
    group_by=[],
    aggregations=[
        Aggregation("total_count", AggregationFunction.COUNT),
        Aggregation("overall_avg", AggregationFunction.AVERAGE, "score"),
    ],
)
```

### Optimization

The grouping optimizer removes redundant grouping fields and duplicate aggregation definitions:

```python
from therismos.grouping.optimizer import optimize

spec = GroupSpec(
    group_by=["category", "region", "category"],  # duplicate field
    aggregations=[
        Aggregation("total", AggregationFunction.COUNT),
        Aggregation("min_price", AggregationFunction.MIN, "price"),
        Aggregation("total", AggregationFunction.MAX, "quantity"),  # duplicate ID
    ],
)

optimized, records = optimize(spec)
# Result: group_by=["region", "category"], aggregations with last "total" kept

# Check what was optimized
for record in records:
    print(record.reason)
```

Optimization rules:

1. **Remove duplicate grouping fields**: When a field appears multiple times in `group_by`, only the last occurrence is kept
2. **Remove duplicate aggregation IDs**: When an aggregation ID appears multiple times, only the last definition is kept

### Converting to Other Formats

#### Built-in Visitors

```python
from therismos.grouping.visitors import StringVisitor, DictVisitor, FieldGathererVisitor

spec = GroupSpec(
    group_by=["category", "region"],
    aggregations=[
        Aggregation("count", AggregationFunction.COUNT),
        Aggregation("avg_price", AggregationFunction.AVERAGE, "price"),
    ],
)

# String representation
string_visitor = StringVisitor()
print(spec.accept(string_visitor))
# Output: ("category,region", "count:count,avg_price:average:price")

# Dictionary representation
dict_visitor = DictVisitor()
result = spec.accept(dict_visitor)
# Result: {
#     "group_by": ["category", "region"],
#     "aggregations": [
#         {"id": "count", "function": "count", "field": None},
#         {"id": "avg_price", "function": "average", "field": "price"}
#     ]
# }

# Collect field names
field_visitor = FieldGathererVisitor()
spec.accept(field_visitor)
print(field_visitor.field_names)
# Output: {"category", "region", "price"}
```

#### MongoDB Aggregation Pipelines

```python
from therismos.grouping.visitors.mongo import MongoVisitor

spec = GroupSpec(
    group_by=["category", "region"],
    aggregations=[
        Aggregation("total", AggregationFunction.COUNT),
        Aggregation("min_price", AggregationFunction.MIN, "price"),
        Aggregation("avg_price", AggregationFunction.AVERAGE, "price"),
        Aggregation("p95_latency", AggregationFunction.P95, "latency"),
    ],
)

visitor = MongoVisitor()
group_stage = spec.accept(visitor)
# Result: {
#     "$group": {
#         "_id": {"category": "$category", "region": "$region"},
#         "total": {"$sum": 1},
#         "min_price": {"$min": "$price"},
#         "avg_price": {"$avg": "$price"},
#         "p95_latency": {"$percentile": {"input": "$latency", "p": [0.95], "method": "approximate"}}
#     }
# }

# Use with PyMongo
# pipeline = [group_stage]
# results = collection.aggregate(pipeline)

# Use with Motor (async)
# pipeline = [group_stage]
# results = await collection.aggregate(pipeline).to_list(length=None)
```

**Single vs. Multiple Grouping Fields:**

By default, the MongoDB visitor simplifies single grouping fields:

```python
# Single grouping field
spec = GroupSpec(
    group_by=["status"],
    aggregations=[Aggregation("count", AggregationFunction.COUNT)],
)

visitor = MongoVisitor()
result = spec.accept(visitor)
# Result: {"$group": {"_id": "$status", "count": {"$sum": 1}}}

# Disable simplification for consistency
visitor = MongoVisitor(simplify_single_group=False)
result = spec.accept(visitor)
# Result: {"$group": {"_id": {"status": "$status"}, "count": {"$sum": 1}}}
```

**Global Aggregation:**

```python
# No grouping fields (aggregate all documents)
spec = GroupSpec(
    group_by=[],
    aggregations=[
        Aggregation("total", AggregationFunction.COUNT),
        Aggregation("avg_age", AggregationFunction.AVERAGE, "age"),
    ],
)

visitor = MongoVisitor()
result = spec.accept(visitor)
# Result: {"$group": {"_id": None, "total": {"$sum": 1}, "avg_age": {"$avg": "$age"}}}
```

### Serialization

Convert grouping specifications to/from compact string format for URLs and APIs:

```python
from therismos.grouping import Serializer

# Create serializer
serializer = Serializer()

# Serialize to string
spec = GroupSpec(
    group_by=["category", "region"],
    aggregations=[
        Aggregation("total", AggregationFunction.COUNT),
        Aggregation("min_price", AggregationFunction.MIN, "price"),
        Aggregation("avg_price", AggregationFunction.AVERAGE, "price"),
    ],
)

text = serializer.serialize(spec)
# Result: ("category,region", "total:count,min_price:min:price,avg_price:average:price")

# Deserialize from string
restored = serializer.deserialize('("category,region", "total:count,min_price:min:price")')
# Result: GroupSpec with category+region grouping and two aggregations

# Format rules:
# - Tuple format: ("field1,field2", "agg1:func,agg2:func:field")
# - Grouping fields: comma-separated list
# - Aggregations: comma-separated list of "id:function" or "id:function:field"
# - COUNT aggregation: "id:count" (field omitted; any field is silently ignored)
# - Other aggregations: "id:function:field" (field required)
```

**Roundtrip Serialization:**

```python
original = GroupSpec(
    group_by=["category", "region"],
    aggregations=[
        Aggregation("total", AggregationFunction.COUNT),
        Aggregation("avg_price", AggregationFunction.AVERAGE, "price"),
    ],
)

serializer = Serializer()
text = serializer.serialize(original)
restored = serializer.deserialize(text)

# Restored spec is equivalent to original
assert restored.group_by == original.group_by
assert len(restored.aggregations) == len(original.aggregations)
```

### Custom Visitors

Create custom visitors to convert grouping specifications to any format:

```python
from therismos.grouping import GroupSpec

class PandasVisitor:
    """Convert grouping spec to pandas groupby + agg syntax."""

    def visit_group_spec(self, spec: GroupSpec) -> str:
        if not spec.group_by:
            # Global aggregation
            agg_dict = self._build_agg_dict(spec.aggregations.values())
            return f"df.agg({agg_dict})"

        # Groupby aggregation
        group_fields = list(spec.group_by)
        agg_dict = self._build_agg_dict(spec.aggregations.values())
        return f"df.groupby({group_fields}).agg({agg_dict})"

    def _build_agg_dict(self, aggregations):
        agg_map = {
            "count": "count",
            "min": "min",
            "max": "max",
            "average": "mean",
            "stddev": "std",
            "median": "median",
        }
        result = {}
        for agg in aggregations:
            if agg.function.value in agg_map:
                func = agg_map[agg.function.value]
                if agg.field:
                    result[agg.id] = (agg.field, func)
        return result

# Usage
visitor = PandasVisitor()
result = spec.accept(visitor)
# Result: "df.groupby(['category', 'region']).agg({...})"
```

### Complete Example: Analytics Dashboard

```python
from therismos.grouping import GroupSpec, Aggregation, AggregationFunction
from therismos.grouping.optimizer import optimize
from therismos.grouping.visitors.mongo import MongoVisitor

# Define grouping specification for sales analytics
sales_analysis = GroupSpec(
    group_by=["product_category", "region", "quarter"],
    aggregations=[
        Aggregation("total_sales", AggregationFunction.COUNT),
        Aggregation("min_price", AggregationFunction.MIN, "sale_price"),
        Aggregation("max_price", AggregationFunction.MAX, "sale_price"),
        Aggregation("avg_price", AggregationFunction.AVERAGE, "sale_price"),
        Aggregation("revenue", AggregationFunction.AVERAGE, "revenue"),
        Aggregation("p50_sale_time", AggregationFunction.MEDIAN, "processing_time"),
        Aggregation("p95_sale_time", AggregationFunction.P95, "processing_time"),
    ],
)

# Optimize the specification
optimized, records = optimize(sales_analysis)

# Convert to MongoDB aggregation pipeline
visitor = MongoVisitor()
group_stage = optimized.accept(visitor)

# Use in MongoDB query
# from pymongo import MongoClient
# client = MongoClient("mongodb://localhost:27017/")
# db = client["sales_db"]
# collection = db["transactions"]
#
# pipeline = [
#     {"$match": {"year": 2024}},  # Filter stage
#     group_stage,                  # Our grouping specification
#     {"$sort": {"total_sales": -1}}  # Sort by sales count
# ]
#
# results = collection.aggregate(pipeline)
# for group in results:
#     print(f"Category: {group['_id']['product_category']}")
#     print(f"Region: {group['_id']['region']}")
#     print(f"Quarter: {group['_id']['quarter']}")
#     print(f"Total Sales: {group['total_sales']}")
#     print(f"Avg Price: {group['avg_price']}")
#     print(f"P95 Processing Time: {group['p95_sale_time']}")
#     print("---")
```

## Polars and Pandas Integration

Therismos provides first-class support for Polars and pandas DataFrames via optional backend visitors.

### Installation

```bash
# Polars backend
pip install therismos[polars]

# Pandas backend
pip install therismos[pandas]

# Both
pip install therismos[polars,pandas]
```

### Polars Integration

```python
import polars as pl
from therismos import F
from therismos.sorting import SortSpec, SortCriterion, SortOrder
from therismos.grouping import GroupSpec, Aggregation, AggregationFunction
from therismos.expr.visitors.polars import PolarsExprVisitor
from therismos.sorting.visitors.polars import PolarsSortSpecVisitor
from therismos.grouping.visitors.polars import PolarsGroupSpecVisitor

df = pl.DataFrame({
    "age": [20, 15, 30],
    "status": ["active", "inactive", "active"],
    "price": [10.0, 20.0, 15.0],
    "category": ["A", "B", "A"],
})

# Filter with expressions
age = F("age")
status = F("status")
expr = (age > 18) & (status == "active")

pl_expr = expr.accept(PolarsExprVisitor())
df.filter(pl_expr)            # eager DataFrame
df.lazy().filter(pl_expr)     # lazy LazyFrame

# Sort with SortSpec
spec = SortSpec([
    SortCriterion("age", SortOrder.DESCENDING),
    SortCriterion("status", SortOrder.ASCENDING),
])
sort = spec.accept(PolarsSortSpecVisitor())
df.sort(by=list(sort.by), descending=list(sort.descending))

# Group and aggregate with GroupSpec
group_spec = GroupSpec(
    group_by=["category"],
    aggregations=[
        Aggregation("count", AggregationFunction.COUNT),
        Aggregation("avg_price", AggregationFunction.AVERAGE, "price"),
    ],
)
grp = group_spec.accept(PolarsGroupSpecVisitor())
df.group_by(list(grp.group_by)).agg(list(grp.agg))
```

### Pandas Integration

```python
import pandas as pd
from therismos import F
from therismos.sorting import SortSpec, SortCriterion, SortOrder
from therismos.grouping import GroupSpec, Aggregation, AggregationFunction
from therismos.expr.visitors.pandas import PandasExprVisitor
from therismos.sorting.visitors.pandas import PandasSortSpecVisitor
from therismos.grouping.visitors.pandas import PandasGroupSpecVisitor

df = pd.DataFrame({
    "age": [20, 15, 30],
    "status": ["active", "inactive", "active"],
    "price": [10.0, 20.0, 15.0],
    "category": ["A", "B", "A"],
})

# Filter with expressions — returns a callable PandasFilter
age = F("age")
status = F("status")
expr = (age > 18) & (status == "active")

mask = expr.accept(PandasExprVisitor())
df[mask(df)]

# Sort with SortSpec
spec = SortSpec([
    SortCriterion("age", SortOrder.DESCENDING),
])
sort = spec.accept(PandasSortSpecVisitor())
df.sort_values(by=list(sort.by), ascending=list(sort.ascending))

# Group and aggregate with GroupSpec
group_spec = GroupSpec(
    group_by=["category"],
    aggregations=[
        Aggregation("count", AggregationFunction.COUNT),
        Aggregation("avg_price", AggregationFunction.AVERAGE, "price"),
    ],
)
grp = group_spec.accept(PandasGroupSpecVisitor())
df.groupby(list(grp.group_by)).agg(**grp.agg)
```

## Expression Templates

Expression templates let you define parameterized filter expressions that are fully serializable to JSON. Named placeholders (`$start`, `$end`) are computed from a runtime context via a transform pipeline DSL, making templates suitable for persistent storage in a database or config file.

### Quick Start

```python
import datetime
from therismos import F, ExprTemplate
from therismos.expr._expr import TemplateParam
from therismos.expr.template import RuleSerializer, TemplateParamSpec

# Build a "last 7 days" template
field = F("created", datetime.date)
expr = (field >= TemplateParam("start", datetime.date)) & (field <= TemplateParam("end", datetime.date))

rule_ser = RuleSerializer()
tmpl = ExprTemplate(
    expr=expr,
    params={
        "start": TemplateParamSpec(description="Range start (inclusive)"),
        "end":   TemplateParamSpec(description="Range end (inclusive)"),
    },
    rules={
        "end":   rule_ser.deserialize("$now | extract_date"),
        "start": rule_ser.deserialize("$now | extract_date | sub_time(7d)"),
    },
)

# Bind: supply a context and get a concrete expression
now = datetime.datetime(2026, 3, 18, 16, 30)
bound = tmpl.bind({"now": now})
# → created{date}>=2026-03-11; created{date}<=2026-03-18
```

### Template Parameters

A `TemplateParam` is a named placeholder that can appear in any value position of an expression node:

```python
from therismos import F
from therismos.expr._expr import TemplateParam

age = F("age", int)
threshold = TemplateParam("min_age", int)   # optional type_ for automatic casting

expr = age >= threshold                      # created like any other expression
```

Use `collect_params()` to inspect placeholders and `bind()` to substitute them:

```python
from therismos import bind, collect_params

params = collect_params(expr)
# → {"min_age": TemplateParam(name="min_age", type_=<class 'int'>)}

bound = bind(expr, {"min_age": 18})
# → age >= 18  (concrete Eq/Ge expression, no TemplateParam)
```

### Serialization Grammar

`TemplateParam` nodes serialize as `$name` or `$name{type}` in the expression grammar:

```python
from therismos import F, Serializer
from therismos.expr._expr import TemplateParam
import datetime

ser = Serializer(type_registry={datetime.date: "date"})
expr = F("created", datetime.date) >= TemplateParam("start", datetime.date)

print(ser.serialize(expr))           # created{date}>=$start{date}
restored = ser.deserialize("created{date}>=$start{date}")
# → Ge(field=Field("created", date), value=TemplateParam("start", date))
```

Backend visitors (`MongoVisitor`, `PolarsExprVisitor`, `PandasExprVisitor`) and `optimize()` raise `UnboundTemplateParamError` if any `TemplateParam` remains — call `bind()` first.

### Transform Pipeline DSL

Rules use a `$source | step1 | step2(arg)` pipeline syntax. Steps are looked up in a `TransformRegistry`:

```
$end   = $now | extract_date
$start = $now | extract_date | sub_time(7d)
```

Duration arguments support: `7d`, `1h`, `30m`, `90s`, `500ms`.

Built-in transforms include date/time extraction and rounding, arithmetic (`add_time`, `sub_time`), type casting (`as_date`, `as_datetime`, `as_int`, …), string ops, and math. Register custom transforms at runtime:

```python
from therismos import DEFAULT_TRANSFORM_REGISTRY

@DEFAULT_TRANSFORM_REGISTRY.register_decorator("fiscal_year_start")
def fiscal_year_start(dt):
    import datetime
    return datetime.date(dt.year if dt.month >= 4 else dt.year - 1, 4, 1)
```

### JSON Persistence

`ExprTemplate` serializes to a JSON-compatible dict for storage in a database or config file:

```python
import json

d = tmpl.to_dict()
# {
#   "version": "1",
#   "expr": "created>=$start; created<=$end",
#   "params": {"start": {"description": "Range start (inclusive)"}, ...},
#   "rules": {"end": "$now | extract_date", "start": "$now | extract_date | sub_time(7d)"}
# }

json_str = tmpl.to_json()
restored = ExprTemplate.from_json(json_str)
bound = restored.bind({"now": datetime.datetime(2026, 3, 18, 16, 30)})
```

---

## Module Structure

Therismos is organized into the following modules and submodules:

```
therismos/
├── __init__.py              # Main package exports
├── expr/                    # Expression module
│   ├── __init__.py          # Expression module exports
│   ├── _expr.py             # Core expression classes (Expr, Field, TemplateParam, etc.)
│   ├── optimizer.py         # Expression optimization and simplification
│   ├── serializer.py        # Grammar-based string serialization/deserialization
│   ├── template.py          # Expression templates (bind, ExprTemplate, RuleSerializer)
│   ├── transforms.py        # Transform registry and built-in transforms
│   └── visitors/            # Visitor implementations package
│       ├── __init__.py      # Core visitor exports
│       ├── _visitors.py     # Built-in visitor implementations
│       ├── mongo.py         # MongoDB query filter converter
│       ├── polars.py        # Polars expression converter
│       └── pandas.py        # Pandas filter callable converter
├── sorting/                 # Sorting module
│   ├── __init__.py          # Sorting module exports
│   ├── _sorting.py          # Core sorting classes (SortOrder, SortCriterion, SortSpec)
│   ├── optimizer.py         # Sort specification optimization
│   ├── serializer.py        # String serialization/deserialization for sort specs
│   └── visitors/            # Visitor implementations package
│       ├── __init__.py      # Core visitor exports
│       ├── _visitors.py     # Built-in visitor implementations
│       ├── mongo.py         # MongoDB sort document converter
│       ├── polars.py        # Polars PolarsSortSpec converter
│       └── pandas.py        # Pandas PandasSortSpec converter
└── grouping/                # Grouping and aggregation module
    ├── __init__.py          # Grouping module exports
    ├── _grouping.py         # Core grouping classes (AggregationFunction, Aggregation, GroupSpec)
    ├── optimizer.py         # Grouping specification optimization
    ├── serializer.py        # String serialization/deserialization for grouping specs
    └── visitors/            # Visitor implementations package
        ├── __init__.py      # Core visitor exports
        ├── _visitors.py     # Built-in visitor implementations
        ├── mongo.py         # MongoDB $group pipeline stage converter
        ├── polars.py        # Polars PolarsGroupSpec converter
        └── pandas.py        # Pandas PandasGroupSpec converter
```

### Core Modules

- **`therismos.expr`**: Core expression AST implementation
  - Expression types: `Eq`, `Ne`, `Lt`, `Le`, `Gt`, `Ge`, `Regex`, `In`, `IsNull`
  - Compound expressions: `AllExpr`, `AnyExpr`, `NotExpr`
  - Logical constants: `TRUE`, `FALSE`
  - Field types: `Field`, `F` (helper function)
  - Visitor protocol: `ExprVisitor`
  - Serialization: `Serializer` (grammar-based string conversion)

- **`therismos.expr.optimizer`**: Expression optimization
  - `optimize(expr, records=None)`: Optimize an expression tree
  - `OptimizationRecord`: Records of optimization transformations

- **`therismos.expr.serializer`**: Grammar-based serialization
  - `Serializer`: Converts expressions to/from compact string representations
  - URL encoding support for query parameters
  - Type annotation control
  - Custom type registration

- **`therismos.expr.template`**: Expression templating
  - `TemplateParam`: Named placeholder for use in value positions of expression nodes
  - `bind(expr, params)`: Substitutes template parameters with concrete values
  - `collect_params(expr)`: Returns all unbound `TemplateParam` nodes in an expression
  - `ExprTemplate`: Persistable wrapper combining an expression with parameter specs and computation rules
  - `RuleSerializer`: Serializes/deserializes transform pipeline rules to/from DSL strings
  - `TemplateParamSpec`, `ParamRule`, `TransformStep`: Supporting data structures

- **`therismos.expr.transforms`**: Transform pipeline for computed parameters
  - `TransformRegistry`: Registry of named transform functions
  - `DEFAULT_TRANSFORM_REGISTRY`: Pre-populated registry with 25+ built-in transforms (date/time arithmetic, type coercion, string operations, math)

- **`therismos.expr.visitors`**: Built-in visitor implementations
  - `StringVisitor`: Converts expressions to human-readable strings
  - `CountVisitor`: Counts nodes in expression trees
  - `DictVisitor`: Converts expressions to dictionary representation
  - `FieldGathererVisitor`: Collects all field names used in an expression

- **`therismos.expr.visitors.mongo`**: MongoDB backend converter
  - `MongoVisitor`: Converts expressions to MongoDB query filters for PyMongo/Motor

- **`therismos.sorting`**: Core sorting specification implementation
  - Sort orders: `SortOrder` (NONE, ASCENDING, DESCENDING)
  - Sort criterion: `SortCriterion` (field + order pair)
  - Sort specification: `SortSpec` (list-like collection of criteria)
  - Visitor protocols: `SortCriterionVisitor`, `SortSpecVisitor`

- **`therismos.sorting.optimizer`**: Sort specification optimization
  - `optimize(spec, records=None)`: Optimize a sort specification
  - Removes NONE orders and redundant criteria
  - `OptimizationRecord`: Records of optimization transformations

- **`therismos.sorting.serializer`**: String serialization
  - `Serializer`: Converts sort specs to/from compact string format
  - Format: comma-separated with +/- prefixes ("age,-created_at,+priority")
  - Support for field type annotations
  - Custom type registration
  - Implicit field type mappings

- **`therismos.sorting.visitors`**: Built-in visitor implementations
  - `StringVisitor`: Converts sort specs to human-readable strings ("age DESC, name ASC")
  - `DictVisitor`: Converts sort specs to dictionary representation
  - `FieldGathererVisitor`: Collects all field names used in a sort spec

- **`therismos.sorting.visitors.mongo`**: MongoDB backend converter
  - `MongoVisitor`: Converts sort specs to MongoDB sort documents for PyMongo/Motor

- **`therismos.grouping`**: Core grouping and aggregation specification implementation
  - Aggregation functions: `AggregationFunction` (COUNT, SUM, MIN, MAX, AVERAGE, STDDEV, MEDIAN, Q1, Q3, P01-P99)
  - Aggregation: `Aggregation` (id + function + optional field)
  - Grouping specification: `GroupSpec` (grouping fields + aggregations dict)
  - Visitor protocol: `GroupSpecVisitor`
  - Serialization: `Serializer` (tuple-based string conversion)

- **`therismos.grouping.optimizer`**: Grouping specification optimization
  - `optimize(spec, records=None)`: Optimize a grouping specification
  - Removes duplicate grouping fields and aggregation IDs
  - `OptimizationRecord`: Records of optimization transformations

- **`therismos.grouping.serializer`**: String serialization
  - `Serializer`: Converts grouping specs to/from compact tuple format
  - Format: ("field1,field2", "agg1:count,agg2:function:field")
  - Validates aggregation function requirements

- **`therismos.grouping.visitors`**: Built-in visitor implementations
  - `StringVisitor`: Converts grouping specs to tuple-based string format
  - `DictVisitor`: Converts grouping specs to dictionary representation
  - `FieldGathererVisitor`: Collects all field names used in a grouping spec (both grouping and aggregation fields)

- **`therismos.grouping.visitors.mongo`**: MongoDB backend converter
  - `MongoVisitor`: Converts grouping specs to MongoDB $group aggregation pipeline stages for PyMongo/Motor
  - Supports all aggregation functions including percentiles (MongoDB 7.0+)
  - Configurable single-field simplification

## Development

Requires Python 3.11 or higher.

### Setup

```bash
# Install dependencies
uv pip install -e ".[dev]"

# Run tests
pytest

# Run linting
ruff check therismos tests

# Run type checking
mypy therismos

# Run all checks with tox
tox
```

### Testing

The project uses pytest with extensive parametrization for comprehensive test coverage:

```bash
# Run all tests
pytest

# Run with coverage
pytest --cov=therismos --cov-report=html

# Run specific test file
pytest tests/test_optimizer.py
```

## License

MIT
