Metadata-Version: 2.4
Name: SPARQLMojo
Version: 0.8.0
Summary: An SQLAlchemy-like ORM for SPARQL endpoints.
License-Expression: MIT
License-File: LICENSE
Keywords: sparql,rdf,orm,pydantic,linked-data,semantic-web
Author: Oliver Sampson
Requires-Python: >=3.12
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: SPARQLWrapper (>=2.0.0)
Requires-Dist: pydantic (>=2.12.4,<3.0.0)
Requires-Dist: rdflib (>=6.0.0)
Project-URL: Documentation, https://codeberg.org/Gitterdan/SPARQLMojo
Project-URL: Homepage, https://codeberg.org/Gitterdan/SPARQLMojo
Project-URL: Repository, https://codeberg.org/Gitterdan/SPARQLMojo
Description-Content-Type: text/markdown

# SPARQLMojo

A minimal SQLAlchemy-like ORM for SPARQL endpoints with Pydantic validation. This is a prototype focused on design clarity rather than production features.

## Features

- Declarative RDF models using Python classes with **Pydantic validation**
- Type-safe field definitions with automatic validation
- A session layer for querying and updating SPARQL endpoints
- A query compiler that converts Pythonic queries to SPARQL
- **Session identity map** to prevent duplicate instances and ensure consistency
- **PREFIX management system** for namespace handling with short-form IRIs
- **Language-tagged literal support** for multilingual text data
- **Property path support** with ORM-like convenience methods and inverse path support for reverse relationship traversal
- **Field-level filtering** with intuitive syntax and automatic datatype casting for numeric comparisons
- **String filtering on IRI fields** with chainable `str()`, `lower()`, `upper()` methods for case-insensitive matching
- **Ontology-aware models** with SchemaRegistry for automatic inverse relationship discovery via `owl:inverseOf`
- **InverseField** for clean, semantic reverse relationship navigation with automatic fallback to SPARQL `^` operator

## Installation

```bash
# Install dependencies
poetry install

# Or install the package in editable mode
pip install -e .
```

## Version

Check the installed version:

```python
import sparqlmojo
print(sparqlmojo.__version__)  # Output: 0.1.0
```

Or from the command line:

```bash
python -c "import sparqlmojo; print(sparqlmojo.__version__)"
```

### Versioning Workflow

This project uses semantic versioning with git tags:

```bash
# Create an annotated tag
git tag -a 0.1.0 -m "Release version 0.1.0"
git push origin 0.1.0

# Update pyproject.toml to match
poetry version 0.1.0
```

## Usage

```python
from typing import Annotated

from sparqlmojo import (
    Condition,
    InverseField,
    IRIField,
    LiteralField,
    Model,
    ObjectPropertyField,
    RDF_TYPE,
    SchemaRegistry,
    Session,
    SPARQLCompiler,
    SubjectField,
)


class Person(Model):
    rdf_type: Annotated[str, IRIField(RDF_TYPE, default="schema:Person")]
    iri: Annotated[str, SubjectField()]
    name: Annotated[str | None, LiteralField("schema:name")] = None
    age: Annotated[int | None, LiteralField("schema:age")] = None
    knows: Annotated[str | None, ObjectPropertyField("schema:knows", range_="Person")] = None


# Create a session
s = Session(endpoint="http://example.org/sparql")

# For endpoints with separate read/write URLs (e.g., Fuseki):
# s = Session(
#     endpoint="http://example.org/sparql",           # For SELECT queries
#     write_endpoint="http://example.org/update"      # For INSERT/DELETE/UPDATE
# )

# Build and compile a query
q = s.query(Person).filter(Condition("age", ">", 30)).limit(5)
sparql = SPARQLCompiler.compile_query(q)
print(sparql)

# Create an instance with validation
bob = Person(iri="http://example.org/bob", name="Bob", age=28)
s.add(bob)
s.commit()

# Pydantic validates types automatically
try:
    invalid = Person(iri="http://example.org/alice", name="Alice", age="not a number")  # Raises ValidationError
except Exception as e:
    print(f"Validation error: {e}")
```

## Identity Map

SPARQLMojo now includes a Session identity map to prevent duplicate instances and ensure consistency:

```python
# First retrieval creates new instance
person1 = session.get(Person, "http://example.org/bob")

# Second retrieval returns the SAME instance (not a duplicate)
person2 = session.get(Person, "http://example.org/bob")

assert person1 is person2  # True - same object reference

# Changes to one reference are visible in all references
person1.name = "Robert"
print(person2.name)  # "Robert" - same object
```

### Benefits

- **Memory Efficiency**: Uses weak references for automatic garbage collection
- **Consistency**: All operations on the same entity work with the same object
- **Performance**: Avoids creating duplicate objects for the same entity
- **Automatic Management**: No manual cache management required

### Manual Cache Management

```python
# Remove specific instance from identity map
session.expunge(person)

# Clear all instances from identity map
session.expunge_all()
```

## PREFIX Management System

SPARQLMojo now includes a comprehensive PREFIX management system for namespace handling:

### Features

- **Built-in Common Prefixes**: schema, foaf, rdf, rdfs, owl, xsd, dc, dcterms, skos, ex
- **Custom Prefix Registration**: Add your own namespace prefixes
- **Short-form IRI Support**: Use `schema:Person` instead of full IRIs
- **Automatic PREFIX Declarations**: SPARQL queries include proper PREFIX clauses
- **IRI Expansion/Contraction**: Convert between short-form and full IRIs

### Usage

```python
from typing import Annotated

from sparqlmojo import IRIField, LiteralField, Model, RDF_TYPE, Session, SubjectField

# Define model with short-form IRIs
class Person(Model):
    rdf_type: Annotated[str, IRIField(RDF_TYPE, default="schema:Person")]
    iri: Annotated[str, SubjectField()]
    name: Annotated[str | None, LiteralField("schema:name")] = None
    age: Annotated[int | None, LiteralField("schema:age")] = None

# Create session with built-in prefix registry
session = Session()

# Register custom prefix
session.register_prefix("my", "http://example.org/my/")

# Query generation with automatic PREFIX declarations
query = session.query(Person)
sparql = query.compile()
# Generates: PREFIX schema: <http://schema.org/> ...

# IRI expansion/contraction
expanded = session.expand_iri("schema:Person")  # "http://schema.org/Person"
contracted = session.contract_iri("http://schema.org/Person")  # "schema:Person"
```

### Benefits

- **Improved Developer Experience**: No need to write full IRIs everywhere
- **Better Readability**: Code is more concise and understandable
- **Easy Maintenance**: Update namespace URIs in one place
- **Standards Compliance**: Generates proper SPARQL PREFIX declarations

## Language-Tagged Literals

SPARQLMojo now supports language-tagged literals for multilingual text data with BCP 47 language tag validation:

### LangString Field

Store single-language text with language tags:

```python
from typing import Annotated
from sparqlmojo import IRIField, LangString, Model, RDF_TYPE, SubjectField

class Article(Model):
    rdf_type: Annotated[str, IRIField(RDF_TYPE, default="http://schema.org/Article")]
    iri: Annotated[str, SubjectField()]
    title_en: Annotated[str | None, LangString("http://schema.org/name", lang="en")] = None
    title_fr: Annotated[str | None, LangString("http://schema.org/name", lang="fr")] = None

article = Article(
    iri="http://example.org/article1",
    title_en="Hello World",
    title_fr="Bonjour le monde"
)

# Generates SPARQL with language tags:
# <article1> schema:name "Hello World"@en .
# <article1> schema:name "Bonjour le monde"@fr .
```

### MultiLangString Field

Store multiple language versions in a single field:

```python
from typing import Annotated
from sparqlmojo import IRIField, Model, MultiLangString, RDF_TYPE, SubjectField

class Document(Model):
    rdf_type: Annotated[str, IRIField(RDF_TYPE, default="http://schema.org/Document")]
    iri: Annotated[str, SubjectField()]
    title: Annotated[dict[str, str | None], MultiLangString("http://schema.org/name")] = None

doc = Document(
    iri="http://example.org/doc1",
    title={
        "en": "Hello",
        "fr": "Bonjour",
        "de": "Hallo",
        "es": "Hola"
    }
)

# Generates multiple SPARQL triples:
# <doc1> schema:name "Hello"@en .
# <doc1> schema:name "Bonjour"@fr .
# <doc1> schema:name "Hallo"@de .
# <doc1> schema:name "Hola"@es .
```

### Complex Language Tags

Support for BCP 47 language tags with region and script codes:

```python
from typing import Annotated
from sparqlmojo import IRIField, LangString, Model, MultiLangString, RDF_TYPE, SubjectField

class InternationalContent(Model):
    rdf_type: Annotated[str, IRIField(RDF_TYPE, default="http://schema.org/Article")]
    iri: Annotated[str, SubjectField()]
    # Region-specific variants
    title_us: Annotated[str | None, LangString("http://schema.org/name", lang="en-US")] = None
    title_gb: Annotated[str | None, LangString("http://schema.org/name", lang="en-GB")] = None

    # Script-specific variants in a single field
    chinese_title: Annotated[dict[str, str | None], MultiLangString("http://schema.org/name")] = None

content = InternationalContent(
    iri="http://example.org/content1",
    title_us="Color",
    title_gb="Colour",
    chinese_title={
        "zh-Hans": "简体中文",  # Simplified Chinese
        "zh-Hant": "繁體中文",  # Traditional Chinese
    }
)
```

### Language Tag Validation

All language tags are validated against BCP 47 format:

```python
# Valid tags
LangString("...", lang="en")        # Simple language
LangString("...", lang="en-US")     # Language + region
LangString("...", lang="zh-Hans")   # Language + script
LangString("...", lang="zh-Hans-CN") # Language + script + region

# Invalid tags (will raise ValueError)
LangString("...", lang="EN")        # Must be lowercase
LangString("...", lang="en us")     # No spaces allowed
LangString("...", lang="english")   # Must be 2-3 letter code
```

### Benefits

- **RDF Standards Compliance**: Proper `@lang` tag syntax with BCP 47 validation
- **Multilingual Support**: Store and retrieve text in multiple languages
- **Flexible Data Modeling**: Choose between separate fields or single multi-language field
- **Automatic SPARQL Generation**: Language tags are automatically added to generated queries
- **Type Safety**: Full Pydantic validation for field values and language codes

## Collection Fields

SPARQLMojo supports collection fields for aggregating multiple values from multi-valued RDF properties into Python lists.

### LiteralList - Aggregate Multiple Literal Values

```python
from typing import Annotated
from sparqlmojo import IRIField, LiteralList, Model, RDF_TYPE, SubjectField

class Product(Model):
    rdf_type: Annotated[str, IRIField(RDF_TYPE, default="http://schema.org/Product")]
    iri: Annotated[str, SubjectField()]
    tags: Annotated[list[str] | None, LiteralList("http://schema.org/keywords")] = None

# Query returns all keyword values as a Python list
product = session.query(Product).first()
print(product.tags)  # ['electronics', 'gadgets', 'portable']
```

### LangStringList - Aggregate Language-Tagged Literals

For multi-valued properties with language tags (like `rdfs:label` with multiple translations):

```python
from typing import Annotated
from sparqlmojo import IRIField, LangStringList, Model, RDF_TYPE, SubjectField
from sparqlmojo.orm.model import LangLiteral

class City(Model):
    rdf_type: Annotated[str, IRIField(RDF_TYPE, default="http://schema.org/City")]
    iri: Annotated[str, SubjectField()]
    labels: Annotated[list[LangLiteral] | None, LangStringList(
        "http://www.w3.org/2000/01/rdf-schema#label"
    )] = None

# Query returns all labels with their language tags
city = session.query(City).first()
for label in city.labels:
    print(f"{label.value} ({label.lang})")
# Output:
# Berlin (en)
# Berlin (de)
# Berlín (es)
```

### IRIList - Aggregate Multiple IRI References

For multi-valued object properties:

```python
from typing import Annotated
from sparqlmojo import IRIField, IRIList, Model, RDF_TYPE, SubjectField

class Person(Model):
    rdf_type: Annotated[str, IRIField(RDF_TYPE, default="http://schema.org/Person")]
    iri: Annotated[str, SubjectField()]
    friends: Annotated[list[str] | None, IRIList("http://schema.org/knows")] = None

# Query returns all friend IRIs as a list
person = session.query(Person).first()
print(person.friends)
# ['http://example.org/alice', 'http://example.org/bob', 'http://example.org/charlie']
```

### TypedLiteralList - Aggregate Typed Literals with XSD Datatype Preservation

For multi-valued properties where you need to preserve the XSD datatype information (e.g., integers, decimals, dates):

```python
from typing import Annotated
from sparqlmojo import IRIField, Model, RDF_TYPE, SubjectField, TypedLiteral, TypedLiteralList

class Document(Model):
    rdf_type: Annotated[str, IRIField(RDF_TYPE, default="http://example.org/Document")]
    iri: Annotated[str, SubjectField()]
    page_counts: Annotated[
        list[TypedLiteral] | None,
        TypedLiteralList("http://example.org/pageCount")
    ] = None

# Query returns TypedLiteral objects with preserved datatypes
doc = session.query(Document).first()
for pc in doc.page_counts:
    print(f"{pc.value} (type: {type(pc.value).__name__}, datatype: {pc.datatype})")
# Output:
# 42 (type: int, datatype: http://www.w3.org/2001/XMLSchema#integer)
# 3.14 (type: Decimal, datatype: http://www.w3.org/2001/XMLSchema#decimal)
```

**Type Conversion Mapping:**

| XSD Datatype | Python Type |
|--------------|-------------|
| `xsd:integer` | `int` |
| `xsd:decimal` | `decimal.Decimal` |
| `xsd:float` | `float` |
| `xsd:double` | `float` |
| `xsd:boolean` | `bool` |
| `xsd:date` | `datetime.date` |
| `xsd:dateTime` | `datetime.datetime` |
| Unknown types | `str` |

Unlike `LiteralList` which loses datatype information during aggregation, `TypedLiteralList` preserves the XSD datatype IRI alongside each value, enabling proper Python type conversion.

### Custom Separators

Collection fields use GROUP_CONCAT internally. You can customize the separator:

```python
# Default separator is ASCII Unit Separator (\\x1f)
tags: Annotated[list[str] | None, LiteralList(
    "http://schema.org/keywords",
    separator="|"  # Use pipe as separator
)] = None
```

### Multiple Collection Fields

Models can have multiple collection fields. SPARQLMojo uses scalar subqueries internally to avoid cartesian product explosion when querying models with multiple collection fields:

```python
class WikidataEntity(Model):
    # No rdf_type field - queries any entity without type constraint
    iri: Annotated[str, SubjectField()]
    labels: Annotated[list[LangLiteral] | None, LangStringList("rdfs:label")] = None
    descriptions: Annotated[list[LangLiteral] | None, LangStringList("schema:description")] = None
    aliases: Annotated[list[LangLiteral] | None, LangStringList("skos:altLabel")] = None
    types: Annotated[list[str] | None, IRIList("wdt:P31")] = None

# Efficiently queries all collection fields without performance issues
entity = session.query(WikidataEntity).filter_by(s="http://www.wikidata.org/entity/Q42").first()
```

### Filtering Collection Fields

Collection fields support polymorphic `contains()` for membership filtering, following SQLAlchemy conventions:

```python
from typing import Annotated
from sparqlmojo import IRIField, IRIList, LiteralList, Model, RDF_TYPE, Session, SubjectField

class Book(Model):
    rdf_type: Annotated[str, IRIField(RDF_TYPE, default="http://schema.org/Book")]
    iri: Annotated[str, SubjectField()]
    genres: Annotated[list[str] | None, LiteralList("http://schema.org/genre")] = None
    related_works: Annotated[list[str] | None, IRIList("http://schema.org/relatedLink")] = None

session = Session()

# Filter books that have "Science Fiction" as a genre
query = session.query(Book).filter(Book.genres.contains("Science Fiction"))
# Generates triple pattern: ?s <http://schema.org/genre> "Science Fiction" .

# Filter books related to a specific work
query = session.query(Book).filter(
    Book.related_works.contains("http://example.org/books/dune")
)
# Generates: ?s <http://schema.org/relatedLink> <http://example.org/books/dune> .
```

**Polymorphic Behavior**: The `contains()` method behaves differently based on field type:
- **Regular fields** (LiteralField, LangString): Substring matching with `FILTER(CONTAINS(...))`
- **Collection fields** (LiteralList, IRIList, etc.): Membership check via triple pattern

This follows SQLAlchemy's convention where `contains()` does the right thing based on context.

### Benefits

- **Natural Python API**: Work with Python lists instead of raw SPARQL results
- **Efficient Queries**: Uses SPARQL 1.1 scalar subqueries for optimal performance
- **Language Tag Preservation**: LangStringList maintains value-language associations
- **Multiple Collection Support**: Query models with many collection fields without cartesian products
- **Intuitive Filtering**: Polymorphic `contains()` works naturally for both substring and membership checks

## UPDATE Operations

SPARQLMojo now supports UPDATE operations with dirty tracking:

```python
# Get an existing person from the database
person = s.get(Person, "http://example.org/bob")

# Modify fields - changes are automatically tracked
person.age = 29
person.name = "Robert"

# Stage the update (only modified fields will be updated)
s.update(person)

# Commit the changes
s.commit()  # Executes SPARQL DELETE/INSERT for changed fields
```

### Dirty Tracking

```python
person = Person(iri="http://example.org/bob", name="Bob", age=30)

# Mark as clean (baseline state)
person.mark_clean()

# Check if modified
print(person.is_dirty())  # False

# Modify a field
person.age = 31
print(person.is_dirty())  # True

# Get changes
changes = person.get_changes()
# {'age': (30, 31)}

# Reset tracking
person.mark_clean()
```

### Partial Updates

Only fields that have been modified since `mark_clean()` was called will be updated:

```python
person = s.get(Person, "http://example.org/bob")  # Automatically marked clean

# Only age is modified
person.age = 31

s.update(person)  # Only generates UPDATE for age field
s.commit()
```

### SPARQL Generated

The update generates SPARQL DELETE/INSERT statements:

```sparql
DELETE DATA {
  <http://example.org/bob> <http://schema.org/age> "30" .
} ;
INSERT DATA {
  <http://example.org/bob> <http://schema.org/age> "31" .
}
```

## Batch Operations

SPARQLMojo now supports efficient batch operations for working with multiple instances:

### Batch Inserts

```python
# Create multiple instances
people = [
    Person(iri=f"http://example.org/person{i}", name=f"Person{i}", age=20 + i)
    for i in range(100)
]

# Add all instances in a single batch operation
s.add_all(people)
s.commit()  # Generates efficient INSERT DATA with all triples
```

### Batch Updates

```python
# Get multiple instances
people = [s.get(Person, f"http://example.org/person{i}") for i in range(10)]

# Modify instances (dirty tracking works with batches)
for person in people:
    person.age += 1

# Update all modified instances in batch
s.update_all(people)
s.commit()  # Only generates updates for actually modified fields
```

### Batch Deletes

```python
# Create instances to delete
people_to_delete = [
    Person(iri=f"http://example.org/person{i}")
    for i in range(50, 100)
]

# Delete all instances in batch
s.delete_all(people_to_delete)
s.commit()  # Generates efficient DELETE WHERE queries
```

### Chunking for Large Batches

For very large datasets, SPARQLMojo automatically chunks operations:

```python
# Configure chunk size (default: 1000 triples)
session = Session(max_batch_size=500)

# Large batch will be automatically chunked
large_batch = [Person(iri=f"http://example.org/person{i}", name=f"Person{i}") for i in range(10000)]
s.add_all(large_batch)
s.commit()  # Automatically splits into multiple INSERT DATA queries
```

### Performance Benefits

- **Reduced overhead**: Single method call instead of many individual calls
- **Optimized SPARQL**: Efficient INSERT DATA queries with many triples
- **Automatic chunking**: Prevents query size limits on endpoints
- **Memory efficient**: Processes large datasets in manageable chunks

## Running Tests

```bash
# Run all tests
poetry run pytest

# Run specific test file
poetry run pytest tests/test_basic.py
```

**See Also**: [Test Fixtures Documentation](tests/README.md) for comprehensive documentation of shared fixtures, test models, and test organization.

## Test Dataset

The project includes a comprehensive library management test dataset in `tests/fixtures/library.ttl` with:
- **10 Books** (classics like "The Great Gatsby", "1984", "Pride and Prejudice")
- **10 Users** (library patrons with member IDs and contact information)
- **5 Checkout Records** (linking books to users with checkout/due dates)
- **Multiple Status Types** (checked in, checked out, overdue)

### Model Definitions

The test fixtures define three interconnected models:

```python
from typing import Annotated
from sparqlmojo import IRIField, LiteralField, Model, ObjectPropertyField, RDF_TYPE, SubjectField

class Book(Model):
    """Book model for library system."""
    rdf_type: Annotated[str, IRIField(RDF_TYPE, default="http://schema.org/Book")]
    iri: Annotated[str, SubjectField()]
    name: Annotated[str | None, LiteralField("http://schema.org/name")] = None
    author: Annotated[str | None, LiteralField("http://schema.org/author")] = None
    isbn: Annotated[str | None, LiteralField("http://schema.org/isbn")] = None
    date_published: Annotated[str | None, LiteralField("http://schema.org/datePublished")] = None
    status: Annotated[str | None, ObjectPropertyField("http://example.org/library/vocab/status")] = None

class Person(Model):
    """Person/User model for library system."""
    rdf_type: Annotated[str, IRIField(RDF_TYPE, default="http://schema.org/Person")]
    iri: Annotated[str, SubjectField()]
    name: Annotated[str | None, LiteralField("http://schema.org/name")] = None
    email: Annotated[str | None, LiteralField("http://schema.org/email")] = None
    member_id: Annotated[str | None, LiteralField("http://example.org/library/vocab/memberId")] = None
    member_since: Annotated[str | None, LiteralField("http://example.org/library/vocab/memberSince")] = None

class CheckoutRecord(Model):
    """Checkout record linking books to patrons."""
    rdf_type: Annotated[str, IRIField(RDF_TYPE, default="http://example.org/library/vocab/CheckoutRecord")]
    iri: Annotated[str, SubjectField()]
    patron: Annotated[str | None, ObjectPropertyField("http://example.org/library/vocab/patron")] = None
    book: Annotated[str | None, ObjectPropertyField("http://example.org/library/vocab/book")] = None
    checkout_date: Annotated[str | None, LiteralField("http://example.org/library/vocab/checkoutDate")] = None
    due_date: Annotated[str | None, LiteralField("http://example.org/library/vocab/dueDate")] = None
    status: Annotated[str | None, LiteralField("http://example.org/library/vocab/status")] = None
```

### Python to RDF Triple Translation

Here's how SPARQLMojo translates Python model instances to RDF triples:

#### Python Code

```python
from sparqlmojo import Session

# Create model instances
book = Book(
    iri="http://example.org/library/book1",
    name="The Great Gatsby",
    author="F. Scott Fitzgerald",
    isbn="978-0743273565",
    date_published="1925"
)

person = Person(
    iri="http://example.org/library/user1",
    name="Alice Johnson",
    email="alice.johnson@example.com",
    member_id="LIB001",
    member_since="2020-01-15"
)

checkout = CheckoutRecord(
    iri="http://example.org/library/checkout1",
    patron="http://example.org/library/user1",
    book="http://example.org/library/book1",
    checkout_date="2025-10-20",
    due_date="2025-11-20",
    status="active"
)

# Add to session and commit
session = Session(endpoint="http://example.org/sparql")
session.add(book)
session.add(person)
session.add(checkout)
session.commit()
```

#### Generated RDF Triples (Turtle Format)

```turtle
# Book triples
<http://example.org/library/book1> a <http://schema.org/Book> .
<http://example.org/library/book1> <http://schema.org/name> "The Great Gatsby" .
<http://example.org/library/book1> <http://schema.org/author> "F. Scott Fitzgerald" .
<http://example.org/library/book1> <http://schema.org/isbn> "978-0743273565" .
<http://example.org/library/book1> <http://schema.org/datePublished> "1925" .

# Person triples
<http://example.org/library/user1> a <http://schema.org/Person> .
<http://example.org/library/user1> <http://schema.org/name> "Alice Johnson" .
<http://example.org/library/user1> <http://schema.org/email> "alice.johnson@example.com" .
<http://example.org/library/user1> <http://example.org/library/vocab/memberId> "LIB001" .
<http://example.org/library/user1> <http://example.org/library/vocab/memberSince> "2020-01-15" .

# CheckoutRecord triples (note: ObjectProperty fields become IRI references)
<http://example.org/library/checkout1> a <http://example.org/library/vocab/CheckoutRecord> .
<http://example.org/library/checkout1> <http://example.org/library/vocab/patron> <http://example.org/library/user1> .
<http://example.org/library/checkout1> <http://example.org/library/vocab/book> <http://example.org/library/book1> .
<http://example.org/library/checkout1> <http://example.org/library/vocab/checkoutDate> "2025-10-20" .
<http://example.org/library/checkout1> <http://example.org/library/vocab/dueDate> "2025-11-20" .
<http://example.org/library/checkout1> <http://example.org/library/vocab/status> "active" .
```

**Key Translation Rules:**

1. **Type Declaration**: The `rdf_type` IRIField with `RDF_TYPE` predicate becomes the `rdf:type` triple (shown as `a` in Turtle)
2. **Subject IRI**: The `iri` SubjectField becomes the subject of all triples
3. **Literal Fields**: Python strings/numbers become quoted literals in RDF
4. **ObjectProperty Fields**: Python IRI strings become unquoted IRI references (linking entities)
5. **Field Names**: Python snake_case field names map to full predicate IRIs defined in the model

This mapping allows you to work with Pythonic objects while maintaining full RDF semantics in the underlying data store.

## Limitations

This is a prototype with several intentional limitations:

- **No transaction support**: Simple staging mechanism for inserts only
- **No conflict resolution**: Basic operations only
- **Not production-ready**: Focuses on demonstrating design patterns

For real-world use, consider adding:
- Proper literal typing
- Better parsing of results
- Streaming results and pagination
- Transaction support

## Known Issues and Risks

### Pydantic Internal API Dependency

SPARQLMojo uses Pydantic's internal `ModelMetaclass` to enable the intuitive field-level filtering syntax:

```python
# This clean syntax is powered by the custom metaclass
query.filter(Person.name == "Alice")
query.filter(Product.price > 100)
```

**The Risk**: The metaclass is imported from Pydantic's **private internal API**:

```python
from pydantic._internal._model_construction import ModelMetaclass as PydanticModelMetaclass
```

The `_internal` prefix indicates this is not part of Pydantic's public API and **could change without notice** in any Pydantic release. According to the Pydantic maintainers, they "want to be able to refactor the `ModelMetaclass` without it being considered a breaking change."

**What This Means**:
- ⚠️ **No stability guarantees**: The metaclass implementation may change in minor/patch releases
- ⚠️ **No deprecation warnings**: Changes won't be announced in advance
- ⚠️ **Potential breakage**: Any Pydantic update could require code changes

**Mitigation Strategy**:
1. **Pin Pydantic version** carefully in production environments
2. **Test thoroughly** after any Pydantic updates before upgrading
3. **Fallback available**: If the metaclass breaks, fall back to the less elegant method-based approach:
   ```python
   # Alternative syntax that doesn't depend on private APIs
   query.filter(Person._get_field_filter("name") == "Alice")
   ```

**Why We Use It Anyway**: The UX benefit of the SQLAlchemy-like syntax is significant for a prototype focused on design clarity. For production use, consider the risk-reward tradeoff for your specific needs.

**References**:
- [Pydantic Issue #6381: ModelMetaclass Import Location](https://github.com/pydantic/pydantic/issues/6381)
- [Pydantic Discussion #7185: ModelField and ModelMetaclass in v2](https://github.com/pydantic/pydantic/discussions/7185)

## VALUES Clause Support

SPARQLMojo supports the SPARQL VALUES clause for efficient query constraints with explicit value sets.

### ORM-Style API (Recommended)

The ORM-style API provides type-safe, model-aware value binding:

```python
from typing import Annotated
from sparqlmojo import IRIField, LangString, LiteralField, Model, RDF_TYPE, Session, SubjectField

class Person(Model):
    rdf_type: Annotated[str, IRIField(RDF_TYPE, default="http://schema.org/Person")]
    iri: Annotated[str, SubjectField()]
    name: Annotated[str | None, LiteralField("http://schema.org/name")] = None
    age: Annotated[int | None, LiteralField("http://schema.org/age")] = None

class Label(Model):
    # No rdf_type - property relationship, not a typed entity
    entity_iri: Annotated[str, SubjectField()]
    text: Annotated[str | None, LangString("http://www.w3.org/2000/01/rdf-schema#label")] = None

# ORM-style: type-safe field reference
query = session.query(Person).values(Person.name, ['Alice', 'Bob', 'Charlie'])
# Generates: VALUES (?name) { ("Alice") ("Bob") ("Charlie") }

# SubjectField automatically maps to ?s variable
query = session.query(Label).values(Label.entity_iri, [
    'http://www.wikidata.org/entity/Q682',
    'http://www.wikidata.org/entity/Q123'
])
# Generates: VALUES (?s) { (<http://www.wikidata.org/entity/Q682>) (<http://www.wikidata.org/entity/Q123>) }
```

### Dict-Style API

For multiple variables or advanced use cases, use the dict-style API:

```python
# Single variable VALUES clause
query = session.query(Person).values({
    'name': ['Alice', 'Bob', 'Charlie']
})
# Generates: VALUES (?name) { ("Alice") ("Bob") ("Charlie") }

# Multiple variables VALUES clause
query = session.query(Person).values({
    'name': ['Alice', 'Bob'],
    'age': [30, 25]
})
# Generates: VALUES (?name ?age) { ("Alice" 30) ("Bob" 25) }

# Combined with other query methods
query = (
    session.query(Person)
    .values({'name': ['Alice', 'Bob', 'Charlie']})
    .filter(Condition("age", ">", 25))
    .limit(10)
)
# Generates: VALUES (?name) { ("Alice") ("Bob") ("Charlie") }
#           FILTER(?age > 25)
#           LIMIT 10
```

### Key Features

- **ORM-Style API**: Type-safe field references with `query.values(Model.field, [values])`
- **SubjectField Support**: Automatic mapping to `?s` variable for subject-based queries
- **Single and Multiple Variables**: Support for both single and multiple variable bindings
- **Method Chaining**: Works seamlessly with existing `filter()`, `limit()`, `offset()` methods
- **SPARQL Injection Protection**: Built-in security with automatic value escaping
- **Comprehensive Validation**: Validates variable names, list lengths, and data types
- **Performance Optimization**: Reduces need for multiple queries or complex filters

### Benefits

- **Efficient Query Constraints**: VALUES clause allows inline value sets for better performance
- **Cleaner Code**: More readable than multiple OR conditions
- **Type Safety**: Proper formatting of different data types (strings, numbers, IRIs)
- **Security**: Automatic protection against SPARQL injection attacks

## Property Paths

SPARQLMojo supports SPARQL property paths for advanced relationship traversal with an ORM-like API:

### Convenience Methods (Recommended)

For common use cases, use convenience methods that automatically infer predicates from your model:

```python
from typing import Annotated
from sparqlmojo import IRIField, LiteralField, Model, ObjectPropertyField, RDF_TYPE, Session, SubjectField

class Person(Model):
    rdf_type: Annotated[str, IRIField(RDF_TYPE, default="schema:Person")]
    iri: Annotated[str, SubjectField()]
    name: Annotated[str | None, LiteralField("schema:name")] = None
    knows: Annotated[str | None, ObjectPropertyField("schema:knows", range_="Person")] = None
    manager: Annotated[str | None, ObjectPropertyField("schema:manager", range_="Person")] = None
    parent: Annotated[str | None, ObjectPropertyField("schema:parent", range_="Person")] = None

# Transitive relationships (one-or-more: +)
# Find all people someone knows, directly or indirectly
query = session.query(Person).transitive('knows')

# Zero-or-more (*)
# Find all managers in the reporting chain
query = session.query(Person).zero_or_more('manager')

# Zero-or-one (?)
# Find people who may or may not have a parent
query = session.query(Person).zero_or_one('parent')

# Alternative paths (|)
# Find people who have either a parent or guardian
query = session.query(Person).alternative('parent', 'guardian')

# Inverse paths (^)
# Find children (inverse of parent relationship)
query = session.query(Person).inverse('child')
```

### Method Chaining

Property path methods work seamlessly with other query methods:

```python
# Find Alice's friends of friends
query = (
    session.query(Person)
    .transitive('knows')
    .filter_by(name='Alice')
    .limit(10)
)

# Find managers with ordering
query = (
    session.query(Person)
    .zero_or_more('manager')
    .order_by('name')
)
```

### Advanced: Complex Property Paths

For complex expressions that don't map to a single field, use `PropertyPath` directly:

```python
from sparqlmojo import PropertyPath

# Sequence paths (A then B)
query = session.query(Person).path(
    'colleague_email',
    PropertyPath('schema:worksFor/^schema:worksFor/schema:email')
)

# Grouped operators
query = session.query(Person).path(
    'contact',
    PropertyPath('(schema:knows|schema:friend)/schema:email')
)
```

### Inverse Property Paths in Model Fields

You can define fields that use inverse property paths directly in your model using `IRIField` with `PropertyPath`. This is useful for Wikidata-style patterns where you need to find resources through inverse relationships:

```python
from typing import Annotated
from sparqlmojo import IRIField, Model, PropertyPath, SubjectField

class Child(Model):
    iri: Annotated[str, SubjectField()]
    # Find parent by traversing parent->child in reverse
    parent: Annotated[str | None, IRIField(
        PropertyPath("^<http://schema.org/children>")
    )] = None

class WikidataStatement(Model):
    iri: Annotated[str, SubjectField()]
    # Find the property that defines this claim predicate
    property_iri: Annotated[str | None, IRIField(
        PropertyPath("^<http://wikiba.se/ontology#claim>")
    )] = None

# Query generates: ?s ^<http://schema.org/children> ?parent .
# Which is equivalent to: ?parent <http://schema.org/children> ?s .
```

**How it works:**
- Normal pattern: `?subject <predicate> ?object` finds objects of subjects
- Inverse pattern: `?subject ^<predicate> ?object` finds subjects where the object points to them via the predicate

### Benefits

- **Type-Safe**: Validates that fields exist in your model
- **No Field/Predicate Mismatch**: Impossible to use wrong predicate for a field
- **Clean API**: ORM-like syntax for 90% of use cases
- **Flexible**: PropertyPath fallback for complex expressions
- **Security**: Built-in SPARQL injection prevention

## Ontology-Aware Models with SchemaRegistry

SPARQLMojo provides ontology-aware modeling capabilities through `SchemaRegistry` and `InverseField`, allowing your models to automatically discover inverse relationships and leverage ontology metadata.

### SchemaRegistry

The `SchemaRegistry` is a thread-safe cache for ontology metadata that can load property information from:
- RDF files (Turtle, RDF/XML, N3, etc.)
- SPARQL endpoints
- Manual registration

```python
from sparqlmojo import SchemaRegistry, Session, PropertyInfo

# Create registry and load ontology from file
registry = SchemaRegistry()
registry.load_from_file("schema.ttl", format="turtle")

# Or create with SPARQL endpoint for lazy loading
registry = SchemaRegistry(endpoint="http://example.org/sparql", cache_ttl=3600)

# Manual registration of property metadata
prop = PropertyInfo(
    predicate_iri="http://schema.org/children",
    inverse_of="http://schema.org/parent",
    domain={"http://schema.org/Person"},
    range_={"http://schema.org/Person"},
    label={"en": "children", "de": "Kinder"},
    comment={"en": "Children of a person"}
)
registry.register_property(prop)

# Use with Session
session = Session(schema_registry=registry)
```

### PropertyInfo Metadata

The `PropertyInfo` dataclass stores comprehensive ontology information:

```python
from sparqlmojo import PropertyInfo

# Property information extracted from ontologies
property_info = PropertyInfo(
    predicate_iri="http://schema.org/children",

    # Inverse relationships (from owl:inverseOf)
    inverse_of="http://schema.org/parent",

    # Domain and range constraints
    domain={"http://schema.org/Person"},
    range_={"http://schema.org/Person"},

    # OWL characteristics
    is_functional=False,
    is_inverse_functional=False,
    is_transitive=False,
    is_symmetric=False,

    # Property hierarchy
    subproperty_of={"http://schema.org/relative"},

    # Multilingual labels and descriptions
    label={"en": "children", "de": "Kinder"},
    comment={"en": "Children of a person"}
)
```

### InverseField with Auto-Discovery

`InverseField` automatically discovers inverse relationships from your ontology using `owl:inverseOf`:

```python
from typing import Annotated
from sparqlmojo import InverseField, IRIField, LiteralField, Model, SubjectField

class Child(Model):
    """Model for finding parents through inverse relationship."""
    iri: Annotated[str, SubjectField()]
    name: Annotated[str | None, LiteralField("http://schema.org/name")] = None

    # Auto-discover that parent is the inverse of children
    parent: Annotated[
        str | None,
        InverseField("http://schema.org/children", auto_discover=True)
    ] = None

class Author(Model):
    """Model for finding authored works."""
    iri: Annotated[str, SubjectField()]
    name: Annotated[str | None, LiteralField("http://schema.org/name")] = None

    # Auto-discover authorOf as inverse of author
    books: Annotated[
        str | None,
        InverseField("http://schema.org/author", auto_discover=True)
    ] = None
```

### How Auto-Discovery Works

When `auto_discover=True`, `InverseField` queries the `SchemaRegistry` for `owl:inverseOf` metadata:

1. **Without ontology metadata**: Uses SPARQL inverse operator (`^`)
   ```python
   # Generates: ?s ^<http://schema.org/children> ?parent
   # Equivalent to: ?parent <http://schema.org/children> ?s
   ```

2. **With ontology metadata**: Uses the named inverse property
   ```python
   # If ontology defines: schema:children owl:inverseOf schema:parent
   # Generates: ?s <http://schema.org/parent> ?parent
   ```

3. **Automatic discovery** happens when the field is used in a query:
   ```python
   # Load ontology
   registry = SchemaRegistry()
   registry.load_from_file("schema.ttl", format="turtle")

   # Create session with registry
   session = Session(schema_registry=registry)

   # Query triggers auto-discovery
   children = session.query(Child).all()
   # InverseField automatically uses schema:parent from ontology
   ```

### Example Ontology File

Here's a sample Turtle ontology defining inverse relationships:

```turtle
@prefix schema: <http://schema.org/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

schema:children a owl:ObjectProperty ;
    rdfs:label "children"@en, "Kinder"@de ;
    rdfs:comment "Children of a person"@en ;
    rdfs:domain schema:Person ;
    rdfs:range schema:Person ;
    owl:inverseOf schema:parent .

schema:parent a owl:ObjectProperty ;
    rdfs:label "parent"@en, "Elternteil"@de ;
    rdfs:comment "Parent of a person"@en ;
    rdfs:domain schema:Person ;
    rdfs:range schema:Person ;
    owl:inverseOf schema:children .

schema:author a owl:ObjectProperty ;
    rdfs:domain schema:CreativeWork ;
    rdfs:range schema:Person ;
    owl:inverseOf <http://example.org/authorOf> .
```

### Comparison: Regular IRIField vs InverseField

```python
from typing import Annotated
from sparqlmojo import IRIField, InverseField, LiteralField, Model, SubjectField

# Forward relationship: Find children of a person
class Person(Model):
    iri: Annotated[str, SubjectField()]
    name: Annotated[str | None, LiteralField("http://schema.org/name")] = None
    children: Annotated[str | None, IRIField("http://schema.org/children")] = None
    # SPARQL: ?s <http://schema.org/children> ?children

# Inverse relationship: Find parent of a child
class Child(Model):
    iri: Annotated[str, SubjectField()]
    name: Annotated[str | None, LiteralField("http://schema.org/name")] = None
    parent: Annotated[
        str | None,
        InverseField("http://schema.org/children", auto_discover=True)
    ] = None
    # With ontology: ?s <http://schema.org/parent> ?parent
    # Without ontology: ?s ^<http://schema.org/children> ?parent

# Both approaches are equivalent but InverseField:
# 1. Uses cleaner property names from ontology
# 2. Follows semantic web best practices
# 3. Automatically adapts to ontology changes
```

### Use Cases

**1. Family Relationships**
```python
# Find parents through children inverse
children_to_parents = session.query(Child).all()
```

**2. Authorship**
```python
# Find all books written by an author
author_books = session.query(Author).filter_by(name="J.K. Rowling").first()
```

**3. Employment**
```python
class Employee(Model):
    iri: Annotated[str, SubjectField()]
    employer: Annotated[
        str | None,
        InverseField("http://example.org/employs", auto_discover=True)
    ] = None
# Find employer through inverse of "employs" relationship
```

**4. Wikidata-Style Patterns**
```python
# Wikidata often requires inverse navigation
class WikidataEntity(Model):
    iri: Annotated[str, SubjectField()]
    # Find items that have this entity as their "instance of" value
    instances: Annotated[
        str | None,
        InverseField("http://www.wikidata.org/prop/direct/P31", auto_discover=True)
    ] = None
```

### Benefits

- **Ontology-Aware**: Leverages existing OWL/RDFS metadata for automatic configuration
- **Cleaner Models**: Use semantic property names instead of inverse operators
- **Flexible Fallback**: Automatically falls back to `^` operator when no inverse defined
- **Thread-Safe Caching**: Registry caches ontology metadata with configurable TTL
- **Multiple Sources**: Load from files, endpoints, or manual registration
- **Multilingual Support**: PropertyInfo includes labels and comments in multiple languages
- **Standards Compliance**: Follows OWL 2 and RDFS specifications

## Field-Level Filtering

SPARQLMojo provides intuitive field-level filtering similar to SQLAlchemy, with automatic datatype casting for numeric comparisons.

### Key Features

- **Intuitive Syntax**: Use Python comparison operators directly on model fields
- **Automatic Datatype Casting**: Numeric comparisons automatically cast to `xsd:decimal`/`xsd:integer`
- **String Operations**: `contains()`, `startswith()`, `endswith()` methods (polymorphic: collection fields use membership check)
- **Membership Testing**: `in_()` and `not_in()` operators
- **Logical Operators**: `and_()`, `or_()`, `not_()` for complex conditions
- **IRI Field Support**: Proper handling of IRI fields with angle bracket syntax

### Basic Usage

```python
from typing import Annotated
from sparqlmojo import IRIField, LiteralField, Model, RDF_TYPE, Session, SubjectField
from sparqlmojo.orm.filtering import FieldFilter, and_, or_

class Person(Model):
    rdf_type: Annotated[str, IRIField(RDF_TYPE, default="http://schema.org/Person")]
    iri: Annotated[str, SubjectField()]
    name: Annotated[str | None, LiteralField("http://schema.org/name")] = None
    age: Annotated[int | None, LiteralField("http://schema.org/age")] = None
    email: Annotated[str | None, LiteralField("http://schema.org/email")] = None
    entity_id: Annotated[str | None, IRIField("http://schema.org/identifier")] = None

session = Session()

# Basic equality filtering
query = session.query(Person).filter(Person.name == "Alice")
# Generates: FILTER(?name = "Alice")

# Numeric comparisons with automatic casting
query = session.query(Person).filter(Person.age > 18)
# Generates: FILTER(xsd:integer(?age) > 18)

# String operations
query = session.query(Person).filter(Person.email.contains("@example.com"))
# Generates: FILTER(CONTAINS(?email, "@example.com"))

# Logical operators
from sparqlmojo.orm.filtering import and_, or_
query = session.query(Person).filter(
    and_(
        Person.name == "Alice",
        Person.age >= 18
    )
)
# Generates: FILTER(?name = "Alice" && xsd:integer(?age) >= 18)

# IN operator
query = session.query(Person).filter(
    Person.name.in_(["Alice", "Bob", "Charlie"])
)
# Generates: FILTER(?name IN ("Alice", "Bob", "Charlie"))

# IRI field filtering
query = session.query(Person).filter(
    Person.entity_id == "http://example.org/Q682"
)
# Generates: FILTER(?entity_id = <http://example.org/Q682>)
```

### String Filtering on IRI Fields

For IRI fields, you often need to filter by the string content of the IRI rather than exact matching. SPARQLMojo provides chainable string function methods:

```python
from typing import Annotated
from sparqlmojo import IRIField, LiteralField, Model, RDF_TYPE, Session, SubjectField

class Document(Model):
    rdf_type: Annotated[str, IRIField(RDF_TYPE, default="http://schema.org/Document")]
    iri: Annotated[str, SubjectField()]
    name: Annotated[str | None, LiteralField("http://schema.org/name")] = None
    format_type: Annotated[str | None, IRIField("http://example.org/formatType")] = None

session = Session()

# Filter IRI field by string content
query = session.query(Document).filter(
    Document.format_type.str().contains("pdf")
)
# Generates: FILTER(CONTAINS(STR(?format_type), "pdf"))

# Case-insensitive filtering with lower()
query = session.query(Document).filter(
    Document.format_type.str().lower().contains("pdf")
)
# Generates: FILTER(CONTAINS(LCASE(STR(?format_type)), "pdf"))

# Case-insensitive filtering with upper()
query = session.query(Document).filter(
    Document.format_type.str().upper().contains("PDF")
)
# Generates: FILTER(CONTAINS(UCASE(STR(?format_type)), "PDF"))

# String prefix/suffix matching
query = session.query(Document).filter(
    Document.format_type.str().startswith("http://")
)
# Generates: FILTER(STRSTARTS(STR(?format_type), "http://"))

query = session.query(Document).filter(
    Document.format_type.str().lower().endswith("/pdf")
)
# Generates: FILTER(STRENDS(LCASE(STR(?format_type)), "/pdf"))
```

**Available Methods:**

| Method | Description | SPARQL Function |
|--------|-------------|-----------------|
| `str()` | Convert IRI to string | `STR()` |
| `lower()` | Convert to lowercase | `LCASE()` |
| `upper()` | Convert to uppercase | `UCASE()` |
| `contains(s)` | Check if string contains substring | `CONTAINS()` |
| `startswith(s)` | Check if string starts with prefix | `STRSTARTS()` |
| `endswith(s)` | Check if string ends with suffix | `STRENDS()` |

**Note:** The `str()` method is required before `lower()` or `upper()` when filtering IRI fields, as IRIs must first be converted to strings before string functions can be applied.

### Benefits

- **Type Safety**: Field references are validated against the model definition
- **RDF Compatibility**: Automatic datatype casting handles the common issue of numeric values stored as strings
- **Intuitive API**: Familiar syntax for developers coming from SQLAlchemy or Django ORM
- **Backward Compatibility**: Existing `Condition` class continues to work alongside new filtering
- **Performance**: Efficient SPARQL generation with minimal overhead

## Dependencies

- `pydantic>=2.12.4` - Data validation and type checking
- `SPARQLWrapper>=2.0.0` - SPARQL endpoint communication
- `rdflib>=6.0.0` - RDF graph parsing and manipulation

## Key Benefits of Pydantic Integration

- **Type Safety**: Fields are validated at runtime against their type annotations
- **Better IDE Support**: Full autocomplete and type hints in modern IDEs
- **Clear Error Messages**: Pydantic provides detailed validation errors
- **Automatic Coercion**: Compatible types are automatically converted (e.g., `"123"` → `123` for int fields)
- **Extra Field Protection**: Unknown fields are rejected by default

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

