Metadata-Version: 2.4
Name: pydantic-identity
Version: 0.0.2
Summary: Pydantic BaseModel with a stable, unique identifier of its schema and validation rules.
Author-email: Ryan Young <dev@ryayoung.com>
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries
Requires-Dist: orjson<4.0,>=3.10
Requires-Dist: pydantic<3.0,>=2.10
Description-Content-Type: text/markdown

# pydantic-identity

[![PyPI](https://img.shields.io/pypi/v/pydantic-identity)](https://pypi.org/project/pydantic-identity/)
[![Tests](https://github.com/ryayoung/pydantic-identity/actions/workflows/tests.yml/badge.svg)](https://github.com/ryayoung/pydantic-identity/actions/workflows/tests.yml)
[![License](https://img.shields.io/github/license/ryayoung/pydantic-identity)](https://github.com/ryayoung/pydantic-identity/blob/main/LICENSE)
[![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Pyright](https://img.shields.io/badge/type%20checker-pyright-blue)](https://github.com/microsoft/pyright)

**pydantic-identity** provides a way to track the full recursive identity (schema “fingerprint”) of your Pydantic models, in 12 character hash. By storing this identifier along with your data, you can later tell whether two records (even deeply nested) were created under the same conditions: model structure, validation rules, documentation, etc.

## Features

- **Schema Hashing**: Generate a stable hash of a model’s entire schema, recursively (includes nested models).
- **Configurable Tracking**: Choose whether to include things like model/field descriptions, field ordering, default values, union type ordering, relative file path, or custom data in the hash.
- **Full Pydantic Compatibility**: `BaseIdentityModel` inherits directly from `pydantic.BaseModel`, and does **not** alter its behavior, or manipulate its `model_config`. You can safely swap `pydantic.BaseModel` for `BaseIdentityModel` anywhere you want.
- **Caching**: Automatically caches computed hashes for performance. A hash is only computed once per *class definition*, and the hash is lazily computed only when it's first accessed.

---

## Installation & Quick Start

```
pip install pydantic-identity
```

```python
from pydantic_identity import BaseIdentityModel

class MyBaseModel(BaseIdentityModel):
    """I'm just like Pydantic BaseModel, but I can hash my schema."""

    foo: str = "I'm a default value, included in the schema hash."


print(MyBaseModel.model_schema_hash_get())  # Hash is computed and cached on the class
print(MyBaseModel().model_schema_hash)
```

> Try this for yourself. You’ll get the same 12-character MD5 prefix hash:

```
221da6ebbb7d
221da6ebbb7d
```

Or, store an auto-populated hash on every model instance. This is efficient, because the hash is cached on the class.


```python
class BaseModelWithSchemaId(BaseIdentityModel):
    """I'm just like Pydantic BaseModel, but I store my schema hash as a field"""

    schema_id: str = ""
    """The class's schema hash. Will be set automatically, if left unset."""

    def model_post_init(self, _):
        """Called automatically after an instance is created."""
        if not self.schema_id:
            self.schema_id = self.model_schema_hash_get()
```

> Using the new base model you just created...

```python
class MyModelWithHash(BaseModelWithSchemaId):
    x: int = 10
    y: str = "Hi"

print(MyModelWithHash().model_dump())
```

```
{'schema_id': '9e19ba08013a', 'x': 10, 'y': 'Hi'}
```

---


## Why Schema Hashing?

If you’re working with complex systems, microservices, or large-scale data storage, you may want to:

- **Compare** two models from different codebases or versions to see if they still match.
- **Validate** that an incoming payload (for example, from a queue or an event stream) was generated by the exact model version you expect.
- **Track** in a database or metadata store that “Model X was hashed with these exact fields, definitions, and docstrings,” so any changes can be quickly detected.

By hashing the full schema of your Pydantic models—and all nested submodels—**pydantic-identity** ensures you can confirm that two references to “the same model” are truly using the same structure.


## Configuration

`BaseIdentityModel` offers class-level configuration variables to tune what gets included in the hash. For example:

```python
class MyConfiguredModel(BaseIdentityModel):
    # Class configuration
    model_schema_hash_track_descriptions = True
    model_schema_hash_track_field_order = True
    model_schema_hash_track_type_order = False
    model_schema_hash_tracked_extra_data = {"some": "config"}
    model_schema_hash_limit_length = 16
    model_schema_hash_tracked_filepath_parts = 1
    model_schema_hash_track_validation_mode = True
    # Model fields
    a: int
    b: str = "default"
```

Below is a high-level overview of each setting:

- **`model_schema_hash_track_descriptions`** *(bool)*  
  Whether to track Pydantic docstrings and field descriptions in the hash. Default: `False`.

- **`model_schema_hash_track_field_order`** *(bool)*  
  Whether to track the **ordering** of fields. Default: `False`.

- **`model_schema_hash_track_type_order`** *(bool)*  
  Whether to track the ordering of type union arguments, `Literal[...]` arguments, and other type hint lists. Default: `False`.

- **`model_schema_hash_tracked_extra_data`** *(Any)*  
  Arbitrary JSON-serializable data to include in the hash. Example: environment variables, custom app configs, etc. Default: `None`.

- **`model_schema_hash_limit_length`** *(int \| None)*  
  The truncated length of the resulting hash string. `None` means use the full length (e.g., 32 characters for MD5). Default: `12`.

- **`model_schema_hash_tracked_filepath_parts`** *(int)*  
  The number of path segments (from the end of the file path) to include in the model’s “full name.” Renaming files can change the hash if you track them. Default: `2`.

- **`model_schema_hash_function`** *(Callable[[bytes], str])*  
  The hashing function used for the schema. By default, MD5 is used. If you need a different algorithm, override this. Default: an MD5 hex wrapper.

- **`model_schema_hash_track_validation_mode`** *(bool)*  
  By default, both serialization (always) and validation modes are used to build the schema. Disabling validation mode can speed things up slightly, at the risk of ignoring potential differences between serialization and validation schema references. Default: `True`.

---

## Advanced Usage

### See the hash input data

To retrieve the exact input data that's being passed to the hashing function, use `.model_schema_hash_get_input_data()`. This returns a JSON objects as `bytes`.

```python
raw_data: bytes = MyConfiguredModel.model_schema_hash_get_input_data()
import json
data = json.loads(raw_data.decode("utf-8"))
print(data)
```

```
{'name': 'test.MyConfiguredModel', 'schemas': {'ser_by_alias': {'proper
...
```

### Extract Full Metadata

For a report on general metadata for your schema...

```python
info = MyConfiguredModel.model_schema_identity_report()
print(info.model_dump_json(indent=2))
```

```
{
  "fullname": "test.MyConfiguredModel",
  "date": "2025-03-17T13:28:59.122335Z",
  "hash": "9ee658c9b78c0b97",
  "hash_settings": {
    "track_descriptions": true,
    ...
```

### Manually Rebuild the Hash

If you ever mutate a model class or for whatever reason need to clear the cache, you can force
a rebuild of the hash.

```python
MyConfiguredModel.model_schema_hash_rebuild()
```

### Multiple Inheritance & Caching

`BaseIdentityModel` handles multiple inheritance well: the schema hash is cached per subclass. Each subclass has its own separate cache.

---

## Testing

All tests are located under `pydantic-identity/tests/` and use standard `pytest`:

```
pytest
```

(Or run them however you prefer.) The tests ensure caching works correctly, that each configuration knob is respected, and that advanced scenarios like multiple inheritance behave properly.

---

## Contributing

Contributions, issues, and feature requests are welcome! Feel free to open an [issue](https://github.com/ryayoung/pydantic-identity/issues) or submit a pull request.

1. Fork the project
2. Create your feature branch (`git checkout -b feature/my-new-feature`)
3. Commit your changes (`git commit -m 'Add some feature'`)
4. Push to the branch (`git push origin feature/my-new-feature`)
5. Open a new Pull Request

---

## License

This project is licensed under the terms of the [MIT License](https://github.com/ryayoung/pydantic-identity/blob/main/LICENSE).

Enjoy hashing your Pydantic models with **pydantic-identity**!
