Metadata-Version: 2.4
Name: hedged-rf
Version: 1.0.0
Summary: Hedged Random Forest — optimized non-equal tree weights for time-series forecasting
Author: Ezequiel Grillo
License: MIT License
        
        Copyright (c) 2026 Ezequiel Grillo
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/Ezequiel025/hedged-rf
Project-URL: Repository, https://github.com/Ezequiel025/hedged-rf
Project-URL: Issues, https://github.com/Ezequiel025/hedged-rf/issues
Project-URL: Documentation, https://hedged-rf.readthedocs.io
Keywords: random forest,hedged random forest,machine learning,time series,forecasting,inflation,EWMA,shrinkage,econometrics
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Office/Business :: Financial
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.23
Requires-Dist: scipy>=1.9
Requires-Dist: cvxpy>=1.3
Requires-Dist: numba>=0.57
Requires-Dist: scikit-learn>=1.2
Provides-Extra: dev
Requires-Dist: pytest>=7.4; extra == "dev"
Requires-Dist: pytest-cov>=4.1; extra == "dev"
Requires-Dist: ruff>=0.3; extra == "dev"
Requires-Dist: mypy>=1.8; extra == "dev"
Requires-Dist: pre-commit>=3.6; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx>=7.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=2.0; extra == "docs"
Requires-Dist: myst-parser>=2.0; extra == "docs"
Requires-Dist: nbsphinx>=0.9; extra == "docs"
Provides-Extra: examples
Requires-Dist: pandas>=2.0; extra == "examples"
Requires-Dist: matplotlib>=3.7; extra == "examples"
Requires-Dist: jupyter>=1.0; extra == "examples"
Requires-Dist: fredapi>=0.5; extra == "examples"
Dynamic: license-file

# hedged-rf

**Hedged Random Forest (HRF)** — optimized non-equal tree weights for time-series forecasting.

[![PyPI version](https://badge.fury.io/py/hedged-rf.svg)](https://badge.fury.io/py/hedged-rf)
[![Python](https://img.shields.io/pypi/pyversions/hedged-rf.svg)](https://pypi.org/project/hedged-rf/)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
[![Tests](https://github.com/Ezequiel025/hedged-rf/actions/workflows/tests.yml/badge.svg)](https://github.com/Ezequiel025/hedged-rf/actions)

---

## Overview

`hedged-rf` is a Python implementation of the **Hedged Random Forest** methodology introduced in:

> Beck, E., & Wolf, M. (2025). *Forecasting Inflation with the Hedged Random Forest.*  
> SNB Working Papers 07/2025, Swiss National Bank.

The standard Random Forest averages all tree predictions with equal weights. The HRF instead **solves a constrained optimization problem** to find weights that minimize the mean-squared forecast error — and crucially, **allows negative weights**, which has been shown to improve accuracy in volatile economic environments.

### Key features

- **EWMA estimation** — gives more weight to recent observations, suitable for non-stationary economic time series
- **Linear shrinkage** — regularizes the mean vector and covariance matrix toward structured targets (CVC), ensuring well-conditioned estimates even with hundreds of trees
- **Gross-exposure constraint** (`κ`) — controls how extreme the weights can be, preventing overfitting
- **Numba-accelerated** — parallelized HAC variance estimation for large forests (500–1000 trees)
- **CVXPY / OSQP solver** — reliable convex optimization backend

---

## Installation

```bash
pip install hedged-rf
```

### Optional extras

```bash
# Development tools (linting, testing)
pip install "hedged-rf[dev]"

# Example notebooks
pip install "hedged-rf[examples]"

# Documentation builder
pip install "hedged-rf[docs]"
```

### Requirements

| Package | Version |
|---------|---------|
| Python | ≥ 3.9 |
| numpy | ≥ 1.23 |
| scipy | ≥ 1.9 |
| cvxpy | ≥ 1.3 |
| numba | ≥ 0.57 |
| scikit-learn | ≥ 1.2 |

---

## Quick start

```python
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from hedged_rf import fit_hrf, extract_tree_predictions

# 1. Train a standard Random Forest
rf = RandomForestRegressor(n_estimators=500, random_state=42)
rf.fit(X_train, y_train)

# 2. Build the residual matrix  R  (T × n_trees)
R = extract_tree_predictions(rf, X_train, y_train)

# 3. Estimate optimal HRF weights
w, mu, Sigma = fit_hrf(
    R,
    lambda_param=0.15,   # EWMA decay (0.15 recommended for monthly data)
    H=6,                 # bandwidth for autocovariance estimation
    kappa=2.0,           # gross-exposure constraint (allows moderate negative weights)
    verbose=True,
)

# 4. Generate out-of-sample forecasts
tree_preds = np.column_stack([tree.predict(X_test) for tree in rf.estimators_])
y_pred_hrf = tree_preds @ w
```

---

## API reference

### `fit_hrf(R, lambda_param, H, kappa, verbose, use_fast)`

Main entry point. Estimates optimal weights for the Hedged Random Forest.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `R` | `np.ndarray` (T × N) | — | Residual matrix from training data |
| `lambda_param` | `float` | `0.15` | EWMA decay parameter ∈ (0, 1) |
| `H` | `int` | `6` | Bandwidth for HAC autocovariance estimation |
| `kappa` | `float` | `2.0` | L₁ gross-exposure constraint |
| `verbose` | `bool` | `False` | Print estimation progress |
| `use_fast` | `bool` | `True` | Enable Numba parallel acceleration |

**Returns** `(w, mu, Sigma)`:
- `w` — optimal weight vector (N,)
- `mu` — estimated mean vector (N,)
- `Sigma` — estimated covariance matrix (N × N)

---

### `extract_tree_predictions(rf, X, y)`

Builds the (T × N) residual matrix from a fitted scikit-learn `RandomForestRegressor`.

```python
R = extract_tree_predictions(rf, X_train, y_train)
# R.shape → (n_samples, n_estimators)
```

---

### `estimate_mu_complete(R, lambda_param, H, verbose, use_fast)`

Stand-alone EWMA + linear-shrinkage estimator for the **mean vector** μ.

---

### `estimate_sigma_complete(R, lambda_param, H, verbose, use_fast)`

Stand-alone EWMA + linear-shrinkage estimator for the **covariance matrix** Σ.

---

### `solve_hrf_optimization(mu, Sigma, kappa, verbose)`

Stand-alone convex solver. Solves:

```
min_w  (w'μ)² + w'Σw
s.t.   w'1 = 1
       ‖w‖₁ ≤ κ
```

---

## Methodology

### Why non-equal weights?

A Random Forest minimizes variance by averaging independent trees, but it ignores the **bias** of individual trees and their **error correlations**. The HRF weights minimize the full MSE decomposition:

```
MSE(f̂_w) = (w'μ)² + w'Σw
```

where μ is the vector of tree bias terms and Σ is the tree-error covariance matrix.

### Why negative weights?

When some trees are systematically biased in the same direction, a negative weight on those trees can **cancel out** that bias, reducing the overall forecast error. The gross-exposure constraint `κ` keeps negative positions bounded.

### The EWMA + shrinkage pipeline

For time-series data (non-i.i.d.), equal-weighted sample estimators of μ and Σ are suboptimal. The HRF instead uses:

1. **EWMA estimation** — exponentially decaying weights give more importance to recent observations, adapting to structural breaks and ARCH/GARCH effects
2. **Linear shrinkage to CVC target** — regularizes Σ̂ toward a constant-variance-covariance matrix, using a data-driven shrinkage intensity α = ν/(ν+γ) derived from HAC variance estimates

### Choosing `lambda_param` (λ)

| Data frequency | Recommended λ |
|---------------|---------------|
| Daily | 0.06 |
| Weekly | 0.10 |
| Monthly | **0.15** (default) |
| Quarterly | 0.25 |

Larger λ → faster decay → more weight on recent observations.

### Choosing `kappa` (κ)

| Value | Effect |
|-------|--------|
| `1.0` | All weights non-negative (like standard weighted RF) |
| `2.0` | **Recommended default** — moderate negative weights allowed |
| `> 2.0` | More aggressive hedging; may overfit on small samples |

---

## Empirical results

From Beck & Wolf (2025), using US and Swiss inflation data (1990–2023):

| Metric | Typical improvement |
|--------|-------------------|
| RMSE vs standard RF | **~4% reduction** (up to 7% for core inflation) |
| MAE vs standard RF | **~5% reduction** (up to 8% for core inflation) |

The HRF outperforms the standard RF consistently across all 6 inflation measures and all 12 forecast horizons tested.

---

## Examples

See the [`examples/`](examples/) directory for notebooks covering:

- **`basic_usage.ipynb`** — end-to-end walkthrough with synthetic data
- **`inflation_forecasting.ipynb`** — replication of the Beck & Wolf (2025) results using FRED-MD data
- **`hyperparameter_sensitivity.ipynb`** — sensitivity analysis for λ and κ

---

## Citation

If you use `hedged-rf` in your research, please cite the original paper:

```bibtex
@techreport{beck2025hrf,
  title   = {Forecasting Inflation with the Hedged Random Forest},
  author  = {Beck, Elliot and Wolf, Michael},
  year    = {2025},
  institution = {Swiss National Bank},
  type    = {SNB Working Papers},
  number  = {07/2025}
}
```

---

## Contributing

Contributions are welcome! Please open an issue or pull request on GitHub.

```bash
git clone https://github.com/Ezequiel025/hedged-rf.git
cd hedged-rf
pip install -e ".[dev]"
pre-commit install
pytest
```

---

## License

This project is licensed under the **MIT License** — see [LICENSE](LICENSE) for details.
