Metadata-Version: 2.3
Name: freshet
Version: 0.1.0
Summary: Data pipelines in pure Python with incremental compute and data flow visualization
Keywords: data-pipeline,caching,workflow,dag,incremental
Author: Samuel S. Watson
Author-email: Samuel S. Watson <samuel.s.watson@gmail.com>
License: MIT License
         
         Copyright (c) 2026 Samuel S. Watson
         
         Permission is hereby granted, free of charge, to any person obtaining a copy
         of this software and associated documentation files (the "Software"), to deal
         in the Software without restriction, including without limitation the rights
         to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
         copies of the Software, and to permit persons to whom the Software is
         furnished to do so, subject to the following conditions:
         
         The above copyright notice and this permission notice shall be included in all
         copies or substantial portions of the Software.
         
         THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
         IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
         FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
         AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
         LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
         OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
         SOFTWARE.
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Dist: anywidget>=0.9.0 ; extra == 'viz'
Requires-Python: >=3.10
Project-URL: Homepage, https://github.com/sswatson/freshet
Project-URL: Repository, https://github.com/sswatson/freshet
Project-URL: Issues, https://github.com/sswatson/freshet/issues
Provides-Extra: viz
Description-Content-Type: text/markdown

# freshet

A lightweight caching framework for Python data pipelines.

Decorate your pipeline functions with `@flow` and `@source`, then instantiate `Freshet` with your pipeline module. Execution and cache state are scoped to that instance.

## Installation

```
pip install freshet
```

## Quick start

```python
from freshet import Freshet
from my_project import my_pipeline

f = Freshet(my_pipeline)

result = f.tap("final_output")   # computes + caches upstream DAG
result = f.tap("final_output")   # cache hit
```

## Base data sources

Use `@source` to declare external data sources as the roots of your pipeline. The function body returns a `File` or `Directory` descriptor:

```python
from freshet import source, File

@source
def raw_data():
    return File("data/raw.parquet")
```

## File-mode outputs

For flows that produce files rather than Python objects, call `flow_output()` to allocate a cache path, write to it, and return the `File`:

```python
import types
from freshet import Freshet, flow, flow_output, File

@flow
def plot_chart() -> File:
    out = flow_output(".png")
    save_plot([1, 2, 3], out.path)
    return out

pipeline = types.ModuleType("my_pipeline")
pipeline.plot_chart = plot_chart

f = Freshet(pipeline, cache_dir=".freshet")
result = f.tap("plot_chart")  # returns a File pointing to the cached file
```

## Auto-bridging

If a flow expects an in-memory type (e.g. `pl.DataFrame`) but receives a `File`, freshet auto-loads it based on the file extension:

```python
import polars as pl
from freshet import source, flow, File

@source
def raw_trades():
    return File("data/raw/trades.parquet")

@flow
def cleaned(raw_trades: pl.DataFrame) -> pl.DataFrame:
    # raw_trades is auto-bridged: File → pl.read_parquet
    return raw_trades.filter(pl.col("price") > 0)
```

## DAG introspection

`freshet` infers a dependency graph from function argument names. If a `@flow` function takes a parameter named `raw_data`, and there's a registered artifact called `raw_data`, freshet records that edge:

```python
artifacts = f.artifacts()  # all registered artifacts
edges = f.edges()          # list of (upstream, downstream) tuples
bases = f.bases()          # only @source artifacts
```

## Custom serializers

By default, Polars DataFrames are serialized as Parquet and everything else uses pickle. You can register your own:

```python
from freshet import Serializer, register_serializer

class MySerializer:
    key = "my_format"
    extension = "bin"

    def can_handle(self, value) -> bool: ...
    def save(self, value, path) -> None: ...
    def load(self, path): ...

register_serializer(MySerializer())
```

## Configuration

```python
f = Freshet(my_pipeline, cache_dir="/path/to/cache")
```

By default the cache lives at `.freshet/` in the current working directory.

## Cache management

```python
f.clear_cache("my_function")  # clear one function's cache
f.clear_cache()               # clear everything
```

## Visualization

```python
widget = f.chart()  # anywidget-compatible DAG visualization
```

## Security

`freshet` uses pickle as a general-purpose serializer fallback. Treat cache directories as trusted input only.

To reduce accidental code execution risk, `f.chart()` does not unpickle cached artifacts for graph details/preview by default. You can opt in with:

```bash
FRESHET_UNSAFE_PICKLE_INSPECT=1
```
