Metadata-Version: 2.4
Name: varframe
Version: 1.4.1
Summary: Declarative DataFrame variable management with automatic DAG dependency resolution and ML model integration
Author-email: Santiago Romagosa <romagosasantiago@gmail.com>
License: MIT License
        
        Copyright (c) 2026 Santiago Romagosa
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/Santi-49/varframe
Project-URL: Repository, https://github.com/Santi-49/varframe
Project-URL: Documentation, https://santi-49.github.io/varframe/
Project-URL: Changelog, https://github.com/santi-49/varframe/blob/main/CHANGELOG.md
Keywords: pandas,dataframe,variables,dag,machine-learning,feature-engineering,data-pipeline,workflow,data-science,etl
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.3.0
Provides-Extra: ml
Requires-Dist: scikit-learn>=1.0.0; extra == "ml"
Requires-Dist: joblib>=1.0.0; extra == "ml"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: pyarrow>=10.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: mkdocs>=1.5.0; extra == "docs"
Requires-Dist: mkdocs-material>=9.0.0; extra == "docs"
Requires-Dist: mkdocstrings[python]>=0.24.0; extra == "docs"
Provides-Extra: all
Requires-Dist: varframe[dev,docs,ml]; extra == "all"
Dynamic: license-file

# VarFrame

**Declarative DataFrame variable management with automatic DAG dependency resolution.**

[![PyPI version](https://badge.fury.io/py/varframe.svg)](https://badge.fury.io/py/varframe)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

VarFrame is a Python library designed to bring structure, reproducibility, and maintainability to your data science and machine learning pipelines. It transforms your data processing from a linear script into a robust, self-documenting graph of dependencies.

## Documentation

**[Read the full documentation](https://santi-49.github.io/varframe/)** (or run `mkdocs serve` locally).

*   **[User Guide](https://santi-49.github.io/varframe/user_guide/)**: Detailed concepts, lazy loading, and ML integration.
*   **[API Reference](https://santi-49.github.io/varframe/api_reference/)**: Class and function specifications.

---

## Why VarFrame?

Data pipelines often start as simple scripts but quickly grow into unmanageable "spaghetti code."

*   **The Problem:**
    *   **Implicit Dependencies:** "Why did column `X` change? Oh, because I modified column `Y` 200 lines above."
    *   **Execution Order Fragility:** "You have to run cell 4 before cell 2, but only if you skipped cell 3."
    *   **Copy-Paste Logic:** Feature engineering logic gets duplicated across training and inference pipelines, leading to drift.

*   **The VarFrame Solution:**
    *   **Explicit Dependencies:** Every variable declares exactly what it needs to be computed.
    *   **Automatic Resolution:** VarFrame builds a Directed Acyclic Graph (DAG) and executes transformations in the mathematically correct order.
    *   **Single Source of Truth:** Your variable classes *are* your pipeline. Use the exact same classes for training, evaluation, and production inference.

## Traditional vs. VarFrame

| Feature | Traditional (Pandas Script) | VarFrame |
| :--- | :--- | :--- |
| **Logic Definition** | Imperative (`df['new'] = df['old'] * 2`) | Declarative (Class `New` depends on `Old`) |
| **Execution Order** | Manually managed (top-to-bottop) | Automatic (DAG-based topological sort) |
| **Reusability** | Low (Copy-paste code blocks) | High (Import classes anywhere) |
| **Lazy Loading** | Manual caching or re-computation | Built-in (Compute only when accessed) |
| **Documentation** | Comments scattered in code | Self-documenting class structure |

---

## Key Features

*   **Declarative Syntax**: Define variables as Python classes.
*   **Automatic Dependency Resolution**: Never worry about execution order again.
*   **Lazy Loading**: Defer expensive computations until the data is actually needed.
*   **ML Integration**: Built-in `BaseModel` and `ModelVariable` to treat model predictions just like any other column.
*   **Metadata Preserved**: Export to Parquet with variable metadata intact.

## Installation

VarFrame is available on PyPI:

```bash
pip install varframe
```

## Quick Start

Here is a simple example showing how VarFrame automatically resolves dependencies.

```python
import pandas as pd
from varframe import VarFrame, BaseVariable, DerivedVariable

# 1. Define your variables as classes
class Radius(BaseVariable):
    raw_column = "radius" # Maps to input DataFrame column
    dtype = "float"

class Area(DerivedVariable):
    # Dependencies are specific pointers to other variable classes
    dependencies = [Radius]

    @classmethod
    def calculate(cls, df):
        import math
        # Access column by variable name
        return math.pi * (df[Radius.name] ** 2)

# 2. Initialize with data
data = pd.DataFrame({"radius": [1, 2, 5]})
vf = VarFrame(data)

# 3. Resolve dependencies
# You ask for 'Area', VarFrame knows it needs 'Radius'
vf.resolve(Area)

print(vf.df)
#    radius       area
# 0     1.0   3.141593
# 1     2.0  12.566371
# 2     5.0  78.539816
```
