Metadata-Version: 2.1
Name: forecastflowml
Version: 0.0.2
Summary: Scalable machine learning forecasting framework with Pyspark
Home-page: https://github.com/canerturkseven/forecastflowml
Author: Caner Turkseven
Author-email: canerturkseven@gmail.com
License: MIT
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Requires-Dist: pyspark[sql] (>=3.0)
Requires-Dist: python-dateutil (>=2.8)
Requires-Dist: scikit-learn (>=1.0)
Requires-Dist: pandas (<2.0)
Requires-Dist: pyspark[sql] (>=3.4) ; python_version >= "3.11"
Provides-Extra: dev
Requires-Dist: pytest ; extra == 'dev'
Requires-Dist: pytest-cov ; extra == 'dev'
Requires-Dist: black[jupyter] ; extra == 'dev'
Requires-Dist: flake8 ; extra == 'dev'
Requires-Dist: tox ; extra == 'dev'
Requires-Dist: lightgbm ; extra == 'dev'
Requires-Dist: xgboost ; extra == 'dev'
Provides-Extra: docs
Requires-Dist: sphinx (>=4.0.0) ; extra == 'docs'
Requires-Dist: pydata-sphinx-theme (==0.13.3) ; extra == 'docs'
Requires-Dist: sphinx-autobuild ; extra == 'docs'
Requires-Dist: myst-nb ; extra == 'docs'
Requires-Dist: plotly ; extra == 'docs'
Requires-Dist: lightgbm ; extra == 'docs'

## ForecastFlowML: Scalable Machine Learning Forecasting with PySpark

[![Python Versions](https://img.shields.io/badge/python-3.7%20|%203.8%20|%203.9%20|%203.10%20|%203.11%20-blue)](https://www.python.org/downloads/) ![Tests](https://github.com/canerturkseven/ForecastFlowML/actions/workflows/tests.yml/badge.svg) [![codecov](https://codecov.io/github/canerturkseven/ForecastFlowML/branch/master/graph/badge.svg?token=DKAE8VSQ1M)](https://codecov.io/github/canerturkseven/ForecastFlowML) [![Documentation Status](https://readthedocs.org/projects/forecastflowml/badge/?version=latest)](https://forecastflowml.readthedocs.io/en/latest/?badge=latest)

ForecastFlowML is a scalable machine learning forecasting framework that enables parallel training (by distributing models rather than data) of scikit-learn like models based on PySpark.

With ForecastFlowML, you can build scikit-learn like regressors as direct multi-step forecasters, and train a seperate model for each group in your dataset.
Our package leverages the power of PySpark to efficiently handle large datasets and enables distributed computing for faster model training.

## Features

ForecastFlowML provides a range of features that make it a powerful and flexible tool for time-series forecasting, including:

- Works with Pandas and Pyspark DataFrames.
- Distributed model training per group in the dataframe.
- Direct multi-step forecasting.
- Built-in time based cross-validation.
- Extensive time-series feature engineering (lag, rolling mean/std, stockout, history length).
- Hyperparameter tuning for each group model with grid search.
- Supports `scikit-learn` like libraries such as `LightGBM` or `XGBoost`.

Whether you're new to time-series forecasting or an experienced data scientist, ForecastFlowML can help you build and deploy accurate forecasting models at scale.

## Documentation

Reach to our latest documentation [here](https://forecastflowml.readthedocs.io/en/latest/).

### Get Started

[What is ForecastFlowML?](https://forecastflowml.readthedocs.io/en/latest/forecastflowml.html)

[Quick Start](https://forecastflowml.readthedocs.io/en/latest/notebooks/quick_start.html)

### User Guide

[Feature Engineering](https://forecastflowml.readthedocs.io/en/latest/notebooks/feature_engineering.html)

[Time Series Cross Validation](https://forecastflowml.readthedocs.io/en/latest/notebooks/cross_validation.html)

[Grid Search](https://forecastflowml.readthedocs.io/en/latest/notebooks/grid_search.html)

[Feature Importance](https://forecastflowml.readthedocs.io/en/latest/notebooks/feature_importance.html)

[Save/Load ForecastFlowML](https://forecastflowml.readthedocs.io/en/latest/notebooks/save_load.html)

## Benchmarks

[Kaggle Walmart M5 Forecasting Competition](https://www.kaggle.com/code/canerturkseven/forecastflowml-m5-forecasting-accuracy)

- Ranks as 18th solution in late submission with minimal effort.

## Installation

### ForecastFlowML installation

You can install the package using the following command:

```
pip install forecastflowml
```

#### Check Java

Make sure you have installed Java 11. You can check whether you have Java or not with the following command:

```
java -version
```

#### Set PYSPARK_PYTHON

In the python script, set PYSPARK_PYTHON environment variable to your Python executable path before creating the spark instance:

```
import sys
import os
from pyspark.sql import SparkSession
os.environ["PYSPARK_PYTHON"] = sys.executable
spark = SparkSession.builder.master("local[*]").getOrCreate()
```
