Metadata-Version: 2.4
Name: recursive-pietro
Version: 0.1.0
Summary: SHAP-based recursive feature elimination with cross-validation and early stopping
Author: Reinier Koops
License-Expression: MIT
Project-URL: Homepage, https://github.com/ReinierKoops/Recursive_pietro
Project-URL: Repository, https://github.com/ReinierKoops/Recursive_pietro
Project-URL: Issues, https://github.com/ReinierKoops/Recursive_pietro/issues
Keywords: shap,feature-selection,feature-elimination,scikit-learn,machine-learning
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Operating System :: OS Independent
Classifier: Typing :: Typed
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: scikit-learn>=1.3
Requires-Dist: shap>=0.43
Requires-Dist: numpy>=1.24
Requires-Dist: pandas>=2.0
Requires-Dist: joblib>=1.3
Provides-Extra: lightgbm
Requires-Dist: lightgbm>=4.0; extra == "lightgbm"
Provides-Extra: xgboost
Requires-Dist: xgboost>=2.0; extra == "xgboost"
Provides-Extra: catboost
Requires-Dist: catboost>=1.2; extra == "catboost"
Provides-Extra: plot
Requires-Dist: matplotlib>=3.7; extra == "plot"
Provides-Extra: progress
Requires-Dist: tqdm>=4.60; extra == "progress"
Provides-Extra: all
Requires-Dist: recursive-pietro[catboost,lightgbm,plot,progress,xgboost]; extra == "all"
Provides-Extra: dev
Requires-Dist: recursive-pietro[all]; extra == "dev"
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Dynamic: license-file

# recursive-pietro

SHAP-based recursive feature elimination with cross-validation and early stopping.

Drop-in sklearn-compatible replacement for Probatus `ShapRFECV` — faster, cleaner, and works in pipelines.

## Installation

```bash
pip install recursive-pietro
```

With optional boosting-library support:

```bash
pip install recursive-pietro[lightgbm]   # LightGBM early stopping
pip install recursive-pietro[xgboost]    # XGBoost early stopping
pip install recursive-pietro[catboost]   # CatBoost early stopping
pip install recursive-pietro[plot]       # matplotlib plotting
pip install recursive-pietro[all]        # everything
```

## Quick start

```python
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from recursive_pietro import ShapFeatureElimination

X, y = make_classification(n_samples=200, n_features=15, n_informative=5, random_state=42)

selector = ShapFeatureElimination(
    RandomForestClassifier(n_estimators=50, random_state=42),
    step=0.2,
    cv=3,
    scoring="roc_auc",
    random_state=42,
)

selector.fit(X, y)

# Selected features
print(selector.selected_features_)

# Use in transform
X_reduced = selector.transform(X)
```

## Early stopping (LightGBM / XGBoost / CatBoost)

```python
from lightgbm import LGBMClassifier

selector = ShapFeatureElimination(
    LGBMClassifier(n_estimators=500, random_state=42),
    step=0.2,
    cv=5,
    scoring="roc_auc",
    early_stopping_rounds=50,
    eval_metric="auc",
)

selector.fit(X, y)
```

## Feature set selection strategies

After fitting, choose different feature sets from the elimination report:

```python
selector.get_feature_set(method="best")              # highest validation score
selector.get_feature_set(method="best_parsimonious")  # fewest features within threshold
selector.get_feature_set(method="best_coherent")      # lowest std within threshold
selector.get_feature_set(method=10)                   # exactly 10 features
```

## sklearn pipeline support

```python
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression

pipe = Pipeline([
    ("feature_selection", ShapFeatureElimination(
        RandomForestClassifier(n_estimators=50, random_state=42),
        step=1, cv=3, scoring="roc_auc",
    )),
    ("classifier", LogisticRegression()),
])

pipe.fit(X, y)
```

## Plotting

```bash
pip install recursive-pietro[plot]
```

```python
selector.plot()
```

## License

MIT
