Metadata-Version: 2.1
Name: erasplit
Version: 1.0.5
Summary: Invariant Gradient Boosted Decision Tree Package - Era Splitting.
Home-page: https://arxiv.org/abs/2309.14496
Download-URL: https://github.com/jefferythewind/erasplit
Maintainer: Tim DeLise
Maintainer-email: tdelise@gmail.com
License: new BSD
Project-URL: Bug Tracker, https://github.com/jefferythewind/erasplit/issues
Project-URL: Documentation, https://github.com/jefferythewind/erasplit
Project-URL: Source Code, https://github.com/jefferythewind/erasplit
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: C
Classifier: Programming Language :: Python
Classifier: Topic :: Software Development
Classifier: Topic :: Scientific/Engineering
Classifier: Development Status :: 5 - Production/Stable
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX
Classifier: Operating System :: Unix
Classifier: Operating System :: MacOS
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: COPYING
Requires-Dist: scikit-learn>=1.3.0
Requires-Dist: numpy>=1.17.3
Requires-Dist: scipy>=1.3.2
Requires-Dist: joblib>=1.1.1
Requires-Dist: threadpoolctl>=2.0.0
Requires-Dist: numpy==1.26.4
Provides-Extra: examples
Requires-Dist: matplotlib>=3.1.3; extra == "examples"
Requires-Dist: scikit-image>=0.16.2; extra == "examples"
Requires-Dist: pandas>=1.0.5; extra == "examples"
Requires-Dist: seaborn>=0.9.0; extra == "examples"
Requires-Dist: pooch>=1.6.0; extra == "examples"
Requires-Dist: plotly>=5.10.0; extra == "examples"
Provides-Extra: docs
Requires-Dist: matplotlib>=3.1.3; extra == "docs"
Requires-Dist: scikit-image>=0.16.2; extra == "docs"
Requires-Dist: pandas>=1.0.5; extra == "docs"
Requires-Dist: seaborn>=0.9.0; extra == "docs"
Requires-Dist: memory_profiler>=0.57.0; extra == "docs"
Requires-Dist: sphinx>=4.0.1; extra == "docs"
Requires-Dist: sphinx-gallery>=0.7.0; extra == "docs"
Requires-Dist: numpydoc>=1.2.0; extra == "docs"
Requires-Dist: Pillow>=7.1.2; extra == "docs"
Requires-Dist: pooch>=1.6.0; extra == "docs"
Requires-Dist: sphinx-prompt>=1.3.0; extra == "docs"
Requires-Dist: sphinxext-opengraph>=0.4.2; extra == "docs"
Requires-Dist: plotly>=5.10.0; extra == "docs"
Provides-Extra: tests
Requires-Dist: matplotlib>=3.1.3; extra == "tests"
Requires-Dist: scikit-image>=0.16.2; extra == "tests"
Requires-Dist: pandas>=1.0.5; extra == "tests"
Requires-Dist: pytest>=5.3.1; extra == "tests"
Requires-Dist: pytest-cov>=2.9.0; extra == "tests"
Requires-Dist: flake8>=3.8.2; extra == "tests"
Requires-Dist: black>=22.3.0; extra == "tests"
Requires-Dist: mypy>=0.961; extra == "tests"
Requires-Dist: pyamg>=4.0.0; extra == "tests"
Requires-Dist: numpydoc>=1.2.0; extra == "tests"
Requires-Dist: pooch>=1.6.0; extra == "tests"
Provides-Extra: benchmark
Requires-Dist: matplotlib>=3.1.3; extra == "benchmark"
Requires-Dist: pandas>=1.0.5; extra == "benchmark"
Requires-Dist: memory_profiler>=0.57.0; extra == "benchmark"

This is the official code base for Era Splitting. Using this repository you can install and run the EraHistGradientBoostingRegressor with the new **era splitting**, **directional era splitting**, or original criterion implemented via simple arguments.

Era Splitting Paper: https://arxiv.org/abs/2309.14496

# Installation via Pip

```
pip install erasplit
```

# Example Implementation w/ Numerai Data

```python

from pathlib import Path
from numerapi import NumerAPI #pip install numerapi
import json

"""Era Split Model"""
from erasplit.ensemble import EraHistGradientBoostingRegressor

napi = NumerAPI()
Path("./v4").mkdir(parents=False, exist_ok=True)
napi.download_dataset("v4/train.parquet")
napi.download_dataset("v4/features.json")

with open("v4/features.json", "r") as f:
    feature_metadata = json.load(f)
features = feature_metadata["feature_sets"]['small']
TARGET_COL="target_cyrus_v4_20"

training_data = pd.read_parquet('v4/train.parquet')
training_data['era'] = training_data['era'].astype('int')

model = EraHistGradientBoostingRegressor( 
    early_stopping=False, 
    boltzmann_alpha=0, 
    max_iter=5000, 
    max_depth=5, 
    learning_rate=.01, 
    colsample_bytree=.1, 
    max_leaf_nodes=32, 
    gamma=1, #for era splitting
    #blama=1,  #for directional era splitting
    #vanna=1,  #for original splitting criterion
)
model.fit(training_data[ features ], training_data[ TARGET_COL ], training_data['era'].values)
```

## Explanation of Parameters
### Boltzmann Alpha
The Boltzmann alpha parameter varies from -infinity to +infinity. A value of zero recovers the mean, -infinity recovers the minumum and +infinity recovers the maximum. This smooth min/max function is applied to the era-wise impurity scores when evaluating a data split. Negative values here will build more invariant trees.

Read more: https://en.wikipedia.org/wiki/Smooth_maximum

### Gamma
Varies over the interval [0,1]. Indicates weight placed on the  era splitting criterion.

### Blama
Varies over the interval [0,1]. Indicates weight placed on the directional era splitting criterion.

### Vanna
Varies over the interval [0,1]. Indicates weight placed on the original splitting criterion.

Behind the scenes, this is for formula which creates a linear combination of the split criteria. Usually we just set one of these to 1 and leave the other at zero.
```python
gain = gamma * era_split_gain + blama * directional_era_split_gain + vanna * original_gain
```

# Complete (New Updated) Code Notebook Examples Available here:

https://github.com/jefferythewind/era-splitting-notebook-examples

# Citations:

````
@misc{delise2023era,
      title={Era Splitting}, 
      author={Timothy DeLise},
      year={2023},
      eprint={2309.14496},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
````

This code was forked from the official scikit-learn repository and is currently a stand-alone version. All community help is welcome for getting these ideas part of the official scikit learn code base or even better, incorporated in the LightGBM code base.

https://scikit-learn.org/stable/about.html#citing-scikit-learn
