Metadata-Version: 2.4
Name: wfvkit
Version: 0.1.5
Summary: Walk-forward validation utilities for time-series ML: splits, purge/embargo, and evaluation helpers.
Author: Mohsen Moghaddam
License-Expression: MIT
Project-URL: Homepage, https://github.com/Mohsentinal/wfv-toolkit
Project-URL: Repository, https://github.com/Mohsentinal/wfv-toolkit
Project-URL: Issues, https://github.com/Mohsentinal/wfv-toolkit/issues
Keywords: walk-forward,time-series,ml,cross-validation,purged,embargo,finance
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: ruff>=0.6; extra == "dev"
Requires-Dist: build>=1.3; extra == "dev"
Dynamic: license-file

# wfv-toolkit (wfvkit)

A tiny Python toolkit for **walk-forward validation** of time-ordered data with **purge + embargo** utilities to reduce label leakage (useful for trading/finance ML and any temporal prediction setup).

---

## What you get

* **Naive time split** (baseline): `naive_time_split`
* **Walk-forward splits** (rolling windows): `walk_forward_splits`
* **Leakage guards**

  * `purge_overlap(train_idx, test_idx)` — removes training indices that overlap the test window
  * `embargo_after(test_idx, embargo)` — blocks samples immediately after the test window
* A runnable example: `examples/demo_naive_vs_purged.py`
* Tests: `pytest`

> The core idea is common in financial ML: if labels use a forward horizon, nearby samples can “bleed” information between train/test. Purge and embargo help.

---

## Install

### Option A: from PyPI (recommended)

```bash
pip install wfvkit
```

### Option B: editable install (development)

```powershell
python -m venv .venv
.\.venv\Scripts\python.exe -m pip install -U pip
.\.venv\Scripts\python.exe -m pip install -e ".[dev]"
```

### Option C: install from GitHub tag

```bash
pip install "git+https://github.com/Mohsentinal/wfv-toolkit.git@v0.1.3"
```

---

## Quickstart

### Minimal example

```python
import datetime as dt

from wfvkit import (
    naive_time_split,
    walk_forward_splits,
    purge_overlap,
    embargo_after,
)

# 10 timestamps (toy example)
times = [dt.datetime(2025, 1, 1, 0, 0) + dt.timedelta(minutes=i) for i in range(10)]

# 1) naive split: pass an index cutoff OR a datetime cutoff
train_idx, test_idx = naive_time_split(times, train_end=6)
train_idx2, test_idx2 = naive_time_split(times, train_end=times[6])

print("naive_idx:", train_idx, test_idx)
print("naive_dt :", train_idx2, test_idx2)

# 2) walk-forward splits (rolling windows)
splits = list(walk_forward_splits(times, train_size=5, test_size=2, step=2, embargo=1))
print("splits:", splits)

# 3) purge + embargo helpers
tr, te = splits[0]
print("purged:", purge_overlap(tr, te))
print("embargo:", sorted(embargo_after(te, embargo=1)))
```

### Run tests

```powershell
.\.venv\Scripts\python.exe -m pytest -q
```

### Run the demo

```powershell
.\.venv\Scripts\python.exe examples\demo_naive_vs_purged.py
```

---

## Usage

### Import the public API

```python
from wfvkit import (
    naive_time_split,
    walk_forward_splits,
    purge_overlap,
    embargo_after,
)
```

### Naive split (baseline)

```python
import datetime as dt

times = [dt.datetime(2025, 1, 1) + dt.timedelta(minutes=i) for i in range(10)]

# Cut by index
train_idx, test_idx = naive_time_split(times, train_end=6)

# Or cut by datetime
train_idx2, test_idx2 = naive_time_split(times, train_end=times[6])
```

### Walk-forward splits + purge + embargo

```python
import datetime as dt

times = [dt.datetime(2025, 1, 1) + dt.timedelta(minutes=i) for i in range(50)]

for train_idx, test_idx in walk_forward_splits(
    times,
    train_size=20,
    test_size=5,
    step=5,
    embargo=2,
):
    train_purged = purge_overlap(train_idx, test_idx)
    embargo_idx = embargo_after(test_idx, embargo=2)

    # Fit on `train_purged`, evaluate on `test_idx`,
    # and avoid using indices in `embargo_idx` for training.
```

---

## Concepts (plain English)

### Purge

If a sample in **train** overlaps the **test** interval, it can leak information. Purging removes those overlapping training indices.

### Embargo

Even after the test window ends, samples **immediately after** can still be contaminated if labels depend on future horizons. Embargo blocks a small number of samples after the test window.

---

## Project layout

```
wfv-toolkit/
  src/wfvkit/
    __init__.py
    splits.py
    leakage.py
    metrics.py
    evaluate.py
  tests/
  examples/
```

---

## Roadmap (next nice upgrades)

* Add **purged k-fold** / **combinatorial purged CV**
* Add utilities for **event-based labels** (start/end times per sample)
* Add richer evaluation helpers (rolling metrics and robustness checks)
* Provide a small CLI (optional)

---

## License

MIT (see `LICENSE`).
