Metadata-Version: 2.4
Name: dragon-ml-toolbox
Version: 20.9.0
Summary: Complete pipelines and helper tools for data science and machine learning projects.
Author-email: Karl Luigi Loza Vidaurre <luigiloza@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/DrAg0n-BoRn/ML_tools
Project-URL: Changelog, https://github.com/DrAg0n-BoRn/ML_tools/blob/master/CHANGELOG.md
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: ==3.12.*
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: LICENSE-THIRD-PARTY.md
Provides-Extra: ml
Requires-Dist: torch; extra == "ml"
Requires-Dist: torchvision; extra == "ml"
Requires-Dist: numpy<2.0; extra == "ml"
Requires-Dist: pandas; extra == "ml"
Requires-Dist: polars>=1.0; extra == "ml"
Requires-Dist: pyarrow; extra == "ml"
Requires-Dist: joblib; extra == "ml"
Requires-Dist: tqdm; extra == "ml"
Requires-Dist: colorlog; extra == "ml"
Requires-Dist: scikit-learn; extra == "ml"
Requires-Dist: shap; extra == "ml"
Requires-Dist: captum; extra == "ml"
Requires-Dist: evotorch; extra == "ml"
Requires-Dist: torchmetrics; extra == "ml"
Requires-Dist: matplotlib; extra == "ml"
Requires-Dist: seaborn; extra == "ml"
Requires-Dist: plotly; extra == "ml"
Requires-Dist: Pillow; extra == "ml"
Requires-Dist: ipython; extra == "ml"
Requires-Dist: ipykernel; extra == "ml"
Requires-Dist: notebook; extra == "ml"
Requires-Dist: jupyterlab; extra == "ml"
Requires-Dist: ipywidgets; extra == "ml"
Provides-Extra: ensemble
Requires-Dist: numpy; extra == "ensemble"
Requires-Dist: numba>=0.60; extra == "ensemble"
Requires-Dist: shap>=0.46; extra == "ensemble"
Requires-Dist: pandas; extra == "ensemble"
Requires-Dist: polars>=1.0; extra == "ensemble"
Requires-Dist: pyarrow; extra == "ensemble"
Requires-Dist: joblib; extra == "ensemble"
Requires-Dist: tqdm; extra == "ensemble"
Requires-Dist: colorlog; extra == "ensemble"
Requires-Dist: scikit-learn<1.8,>=1.5; extra == "ensemble"
Requires-Dist: imbalanced-learn; extra == "ensemble"
Requires-Dist: xgboost; extra == "ensemble"
Requires-Dist: lightgbm; extra == "ensemble"
Requires-Dist: matplotlib; extra == "ensemble"
Requires-Dist: seaborn; extra == "ensemble"
Requires-Dist: ipython; extra == "ensemble"
Requires-Dist: ipykernel; extra == "ensemble"
Requires-Dist: notebook; extra == "ensemble"
Requires-Dist: jupyterlab; extra == "ensemble"
Requires-Dist: ipywidgets; extra == "ensemble"
Provides-Extra: mice
Requires-Dist: numpy<2.0; extra == "mice"
Requires-Dist: pandas; extra == "mice"
Requires-Dist: polars>=1.0; extra == "mice"
Requires-Dist: joblib; extra == "mice"
Requires-Dist: miceforest>=6.0.0; extra == "mice"
Requires-Dist: plotnine>=0.12; extra == "mice"
Requires-Dist: matplotlib; extra == "mice"
Requires-Dist: statsmodels; extra == "mice"
Requires-Dist: lightgbm<=4.5.0; extra == "mice"
Requires-Dist: shap; extra == "mice"
Requires-Dist: colorlog; extra == "mice"
Requires-Dist: pyarrow; extra == "mice"
Provides-Extra: excel
Requires-Dist: pandas; extra == "excel"
Requires-Dist: openpyxl; extra == "excel"
Requires-Dist: ipython; extra == "excel"
Requires-Dist: ipykernel; extra == "excel"
Requires-Dist: notebook; extra == "excel"
Requires-Dist: jupyterlab; extra == "excel"
Requires-Dist: ipywidgets; extra == "excel"
Requires-Dist: colorlog; extra == "excel"
Provides-Extra: gui-boost
Requires-Dist: numpy; extra == "gui-boost"
Requires-Dist: joblib; extra == "gui-boost"
Requires-Dist: FreeSimpleGUI>=5.2; extra == "gui-boost"
Requires-Dist: xgboost; extra == "gui-boost"
Requires-Dist: lightgbm; extra == "gui-boost"
Provides-Extra: gui-torch
Requires-Dist: numpy<2.0; extra == "gui-torch"
Requires-Dist: torch; extra == "gui-torch"
Requires-Dist: FreeSimpleGUI>=5.2; extra == "gui-torch"
Dynamic: license-file

# dragon-ml-toolbox

A collection of machine learning pipelines and utilities, structured as modular packages for easy reuse and installation. This package has no base dependencies, allowing for lightweight and customized virtual environments.

### Features:

- Modular scripts for data science workflows, including data exploration, ETL, model training, evaluation, and inference.
- Support for PyTorch-based models, ensemble learning (XGBoost, LightGBM), and MICE imputation.

## Installation

**Python 3.12**

### Via PyPI

Install the latest stable release from PyPI:

Using pip:

```bash
pip install dragon-ml-toolbox
```

Using UV:

```bash
uv add dragon-ml-toolbox
```

### Via conda-forge

Install from the conda-forge channel:

```bash
conda install -c conda-forge dragon-ml-toolbox
```

## Modular Installation

This toolbox is designed as a collection of mutually exclusive environments due to conflicting core dependencies.

- Rule: Create a fresh virtual environment for each module to use.

### 📦 Core Machine Learning Toolbox [ML]

Installs a comprehensive set of tools for typical data science workflows, including data manipulation, modeling, and evaluation using PyTorch.

➡️ On Windows, the default installation includes the CPU version of PyTorch. Follow the official instructions to install the CUDA version: [PyTorch website](https://pytorch.org/get-started/locally/)

```Bash
pip install "dragon-ml-toolbox[ML]"
```

#### Modules:

```Bash
data_exploration
ETL_cleaning
ETL_engineering
IO_tools
keys
math_utilities
ML_callbacks
ML_chain
ML_configuration
ML_datasetmaster
ML_evaluation
ML_evaluation_captum
ML_finalize_handler
ML_inference
ML_inference_sequence
ML_inference_vision
ML_models
ML_models_sequence
ML_models_vision
ML_optimization
ML_scaler
ML_trainer
ML_utilities
ML_vision_transformers
optimization_tools
path_manager
plot_fonts
resampling
schema
serde
SQL
utilities
constants
```

---

### 🌳 Ensemble Learning [ensemble]

Comprehensive set of tools for typical data science workflows focused on **XGBoost** and **LightGBM**.

```Bash
pip install "dragon-ml-toolbox[ensemble]"
```

#### Modules:

```Bash
data_exploration
ensemble_evaluation
ensemble_inference
ensemble_learning
ETL_cleaning
ETL_engineering
IO_tools
math_utilities
optimization_tools
path_manager
plot_fonts
PSO_optimization
resampling
schema
serde
SQL
utilities
constants
```

---

### 🔬 MICE Imputation and Variance Inflation Factor [mice]

Utilities for advanced data cleaning and statistical checks. Features **Multiple Imputation by Chained Equations (MICE)** for handling missing data and **Variance Inflation Factor (VIF)** analysis to detect multicollinearity in features.

```Bash
pip install "dragon-ml-toolbox[mice]"
```

#### Modules:

```Bash
IO_tools
math_utilities
MICE
path_manager
plot_fonts
serde
utilities
VIF
```

---

### 📋 Excel File Handling [excel]

Installs dependencies required to process and handle .xlsx or .xls files.

```Bash
pip install "dragon-ml-toolbox[excel]"
```

#### Modules:

```Bash
IO_tools
excel_handler
path_manager
```

---

### 🎰 GUI for Boosting Algorithms (XGBoost, LightGBM) [gui-boost]

GUI tools compatible with XGBoost and LightGBM models used for inference.

```Bash
pip install "dragon-ml-toolbox[gui-boost]"
```

#### Modules:

```Bash
ensemble_inference
GUI_tools
IO_tools
path_manager
schema
serde
constants
```

---

### 🤖 GUI for PyTorch Models [gui-torch]

GUI tools compatible with PyTorch models used for inference.

```Bash
pip install "dragon-ml-toolbox[gui-torch]"
```

#### Modules:

```Bash
GUI_tools
IO_tools
keys
ML_models
ML_models_sequence
ML_models_vision # Requires: torchvision and Pillow
ML_inference
ML_inference_sequence
ML_inference_vision # Requires: torchvision and Pillow
ML_vision_transformers # Requires: torchvision and Pillow
ML_scaler
path_manager
schema
constants
```

---

## Usage

After installation, import modules like this:

```python
from ml_tools.serde import serialize_object, deserialize_object
from ml_tools.IO_tools import train_logger
```
