Metadata-Version: 2.4
Name: schemaguard
Version: 0.2.0
Summary: Lightweight Schema Validator
Home-page: https://github.com/inzamam1121/schemaguard
Author: Inzamam Yousaf
Author-email: uniprecisionofficial@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: pandas>=1.0
Requires-Dist: pyyaml>=6.0
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# schemaguard

**Lightweight Schema Validator for Python**

[![PyPI](https://img.shields.io/pypi/v/schemaguard)](https://pypi.org/project/schemaguard/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

---

## About

`schemaguard` is a lightweight Python library that allows you to define and validate **schemas for Pandas DataFrames**.
It helps ensure your data meets expected formats, types, and constraints before entering production pipelines, saving you time and reducing errors.

With `schemaguard`, you can:

- Define column types, nullability, uniqueness, regex patterns, and numeric bounds.
- Validate Pandas DataFrames against YAML or Python-based schemas.
- Version your schemas to track changes over time.
- Integrate easily into ETL pipelines, Airflow, or CI/CD workflows.

---

## Features

- **Column type validation** (`int`, `float`, `str`, etc.)
- **Null check** (`not_null`)
- **Uniqueness check** (`unique`)
- **Regex pattern matching** (`regex`)
- **Numeric bounds** (`min_value`, `max_value`)
- **Schema versioning**
- **YAML-based schema loader**
- Lightweight and fast, ideal for small and large datasets.

---

## Installation

You can install `schemaguard` via pip:

```bash
pip install schemaguard
````

Or install from source for development:

```bash
git clone https://github.com/inzamam1121/schemaguard.git
cd schemaguard
pip install -e .
```

---

## Quick Start

### 1. Define a YAML schema

Create `files/schema_v1.yml`:

```yaml
version: "1.0"

columns:
  order_id:
    type: int
    not_null: true
    unique: true

  amount:
    type: float
    not_null: true
    min: 0
    max: 10000

  email:
    type: string
    regex: "^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$"
```

---

### 2. Validate your DataFrame

```python
import pandas as pd
from schemaguard.loader.yaml_loader import load_schema_from_yaml

# Load schema
schema = load_schema_from_yaml("files/schema_v1.yml")

# Load data
df = pd.read_csv("files/data.csv")

# Validate
report = schema.validate(df, raise_on_error=False)

# Print report
print(report)
```

---

### 3. Sample Output

```python
{
  'ok': False,
  'errors': {
    'amount': 'null values present',
    'email': 'regex mismatch',
    'order_id': 'duplicate values found'
  },
  'version': '1.0'
}
```

---

## Versioning

`schemaguard` uses **semantic versioning**:

* `0.1.0` — Initial release with YAML schema loader and basic validators.
* Future releases will add more validators, performance improvements, and integration examples.

---

## Use Cases

* Validate incoming ETL data before storing in warehouse.
* Ensure API payloads match expected schema.
* Quickly enforce data contracts in collaborative teams.
* Detect and prevent data quality issues in production.

---

## Author

**Inzamam Yousaf**
Email: [uniprecisionofficial@gmail.com](mailto:uniprecisionofficial@gmail.com)
GitHub: [https://github.com/inzamam1121](https://github.com/inzamam1121)

---

## License

This project is licensed under the **MIT License** — see the [LICENSE](LICENSE) file for details.

---

## Contributing

Contributions, issues, and feature requests are welcome!
Feel free to fork the repository and submit pull requests.


