Metadata-Version: 2.2
Name: tabensemb
Version: 0.3
Summary: A framework to ensemble model bases and evaluate various models for tabular predictions.
Home-page: https://github.com/Luwen-Zhang/tabular_ensemble
Author: Luwen-Zhang's Group at SJTU
Requires-Python: >=3.8.0
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: autogluon.tabular[all]<1.0.0,>=0.8.2
Requires-Dist: autogluon.common<1.0.0,>=0.8.2
Requires-Dist: autogluon.core<1.0.0,>=0.8.2
Requires-Dist: autogluon.features<1.0.0,>=0.8.2
Requires-Dist: captum>=0.6.0
Requires-Dist: gensim<4.3.3,>=4.3.0
Requires-Dist: matplotlib>=3.8.2
Requires-Dist: numba<0.58.2,>=0.58.0
Requires-Dist: numpy>=1.26.2
Requires-Dist: opencv-contrib-python<4.8.2,>=4.8.0
Requires-Dist: openpyxl>=3.1.2
Requires-Dist: pandas>=1.5.3
Requires-Dist: scikit-learn<1.4,>=1.2.2
Requires-Dist: scikit-optimize>=0.9.0
Requires-Dist: scipy>=1.11.4
Requires-Dist: seaborn>=0.13.0
Requires-Dist: statsmodels<0.14.1,>=0.14.0
Requires-Dist: tqdm>=4.66.1
Requires-Dist: torchmetrics>=0.11.4
Requires-Dist: pytorch-widedeep<1.6.2,>=1.3.2
Requires-Dist: pytorch-tabnet>=4.0
Requires-Dist: pytorch-tabular>=1.0.2
Requires-Dist: ray>=2.3.1
Requires-Dist: miceforest>=5.7.0
Requires-Dist: shap>=0.43.0
Requires-Dist: einops>=0.6.1
Requires-Dist: pytorch-lightning<2.0.0,>=1.9.5
Requires-Dist: traitlets>=5.9.0
Provides-Extra: torch
Requires-Dist: torch>=1.12.0; extra == "torch"
Provides-Extra: test
Requires-Dist: torch>=1.12.0; extra == "test"
Requires-Dist: pytest; extra == "test"
Requires-Dist: pytest-cov; extra == "test"
Requires-Dist: pytest-order; extra == "test"
Requires-Dist: pytest-mock; extra == "test"
Requires-Dist: black; extra == "test"
Provides-Extra: doc
Requires-Dist: sphinx==7.2.5; extra == "doc"
Requires-Dist: sphinx_rtd_theme==1.3.0; extra == "doc"
Requires-Dist: nbsphinx==0.9.3; extra == "doc"
Requires-Dist: pandoc==2.3; extra == "doc"
Requires-Dist: myst-parser==2.0.0; extra == "doc"
Requires-Dist: sphinx_copybutton==0.5.2; extra == "doc"
Requires-Dist: sphinx_paramlinks==0.6.0; extra == "doc"
Requires-Dist: numpydoc==1.5.0; extra == "doc"
Requires-Dist: pydata_sphinx_theme==0.13.3; extra == "doc"
Provides-Extra: notebook
Requires-Dist: jupyter; extra == "notebook"
Requires-Dist: notebook<7.0.0; extra == "notebook"
Dynamic: author
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# tabular_ensemble
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![codecov](https://codecov.io/gh/Luwen-Zhang/tabular_ensemble/graph/badge.svg?token=APnN7LFtv9)](https://codecov.io/gh/Luwen-Zhang/tabular_ensemble)
[![Test](https://github.com/Luwen-Zhang/tabular_ensemble/actions/workflows/python-package.yml/badge.svg)](https://github.com/Luwen-Zhang/tabular_ensemble/actions/workflows/python-package.yml)
[![](https://img.shields.io/badge/Python-3.10-blue)](https://github.com/Luwen-Zhang/tabular_ensemble)
[![Documentation Status](https://readthedocs.org/projects/tabular-ensemble/badge/?version=latest)](https://tabular-ensemble.readthedocs.io/en/latest/?badge=latest)

A framework to evaluate various models for tabular regression and classification tasks. The package integrates 25 machine learning (including deep learning) models for tabular prediction 
tasks from the following well-established model bases:

* [`autogluon`](https://github.com/autogluon/autogluon)
  * `"LightGBM"`, `"CatBoost"`, `"XGBoost"`, `"Random Forest"`, `"Extremely Randomized Trees"`, `"K-Nearest Neighbors"`, `"Linear Regression"`, `"Neural Network with MXNet"`, `"Neural Network with PyTorch"`, `"Neural Network with FastAI"`.
* [`pytorch_widedeep`](https://github.com/jrzaurin/pytorch-widedeep)
  * `"TabMlp"`, `"TabResnet"`, `"TabTransformer"`, `"TabNet"`, `"SAINT"`, `"ContextAttentionMLP"`, `"SelfAttentionMLP"`, `"FTTransformer"`, `"TabPerceiver"`, `"TabFastFormer"`.
* [`pytorch_tabular`](https://github.com/manujosephv/pytorch_tabular)
  * `"Category Embedding"`, `"NODE"`, `"TabNet"`, `"TabTransformer"`, `"AutoInt"`, `"FTTransformer"`.

You are able to implement your own models, data processing pipelines, and datasets under the flexible and 
well-tested framework for consistent comparisons with baseline models, which is even easier when your own model is 
based on `pytorch`. 

<img width="600" alt="image" src="https://github.com/user-attachments/assets/0fe47266-ae58-4e6b-bcf6-1108ebd762bc">

Supported features for all model bases:

* Data processing
  * Data splitting (training/validation/testing sets)
  * Data imputation
  * Data filtering
  * Data scaling
  * Data augmentation
  * Feature augmentation
  * Feature selection
  * etc.
* Multi-modal data
* Loading [UCI datasets](https://archive.ics.uci.edu/datasets)
* Data/result analysis
  * Leaderboard
  * Box plot
  * Pair plot
  * Pearson correlation
  * Partial dependency plot (with bootstrapping)
  * Feature importance (Permutation and SHAP)
  * etc.
* Building models upon other trained models
* `pytorch_lightning`-based training for `pytorch` models
* Gaussian-process-based Bayesian hyperparameter optimization
* Cross-validation (including continuing from a cross-validation checkpoint)
* Saving, loading, and migrating models

The package stands on the shoulder of the giants:

* [scikit-learn](https://scikit-learn.org/)
* [PyTorch](https://pytorch.org/)
* [PyTorch Lightning](https://lightning.ai/)
* etc. (See `requirements.txt`)


## Installation/Usage

A full documentation is available [here](https://tabular-ensemble.readthedocs.io/en/latest/index.html). For a quick start:

1. `tabular_ensemble` can be installed using pypi by running the following command:

```shell
pip install tabensemb[torch]
```

Please use `pip install tabensemb` instead if you already have `torch>=1.12.0` installed. Use `pip install tabensemb[test]` if you want to run unit tests. 

To install from source,

```shell
pip install -e .[torch]
```

2. (Optional) Run unit tests after installed `tabensemb[test]`:

```shell
cd test
pytest .
```

3. Place your `.csv` or `.xlsx` file in a `data` subfolder (e.g., `data/sample.csv`), and generate a configuration file in a `configs` subfolder (e.g., `configs/sample.py`), containing the following content
```python
cfg = {
    "database": "sample",
    "continuous_feature_names": ["cont_0", "cont_1", "cont_2", "cont_3", "cont_4"],
    "categorical_feature_names": ["cat_0", "cat_1", "cat_2"],
    "label_name": ["target"],
}
```

4. Run the experiment using the configuration and the data using
```python
python main.py --base sample --epoch 10
```
where `--base` refers to the configuration file, and additional arguments (such as `--epoch` here) refer to those in `config/default.py`.

See the [documentation pages](https://tabular-ensemble.readthedocs.io/en/latest/index.html) for details.

## Citation

If you use this repository, please cite us as:

```text
(Will be updated after released on arXiv or published)
```
