Metadata-Version: 2.1
Name: distvae_tabular
Version: 0.1.0
Summary: DistVAE Implementation Package for Synthetic Data Generation
Home-page: https://github.com/an-seunghwan/DistVAE-Tabular
Author: Seunghwan An
Author-email: dpeltms79@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: filelock ==3.15.4
Requires-Dist: fsspec ==2024.6.1
Requires-Dist: Jinja2 ==3.1.4
Requires-Dist: joblib ==1.4.2
Requires-Dist: MarkupSafe ==2.1.5
Requires-Dist: mpmath ==1.3.0
Requires-Dist: networkx ==3.3
Requires-Dist: numpy ==1.26.4
Requires-Dist: pandas ==2.2.2
Requires-Dist: python-dateutil ==2.9.0.post0
Requires-Dist: pytz ==2024.1
Requires-Dist: scikit-learn ==1.5.1
Requires-Dist: scipy ==1.14.0
Requires-Dist: six ==1.16.0
Requires-Dist: sympy ==1.13.1
Requires-Dist: threadpoolctl ==3.5.0
Requires-Dist: torch ==2.2.2
Requires-Dist: tqdm ==4.66.4
Requires-Dist: typing-extensions ==4.12.2
Requires-Dist: tzdata ==2024.1

# DistVAE-Tabular

**DistVAE** is a novel approach to distributional learning in the VAE framework, focusing on accurately capturing the underlying distribution of the observed dataset through a nonparametric CDF estimation. 

We utilize the continuous ranked probability score (CRPS), a strictly proper scoring rule, as the reconstruction loss while preserving the mathematical derivation of the lower bound of the data log-likelihood. Additionally, we introduce a synthetic data generation mechanism that effectively preserves differential privacy.

### 1. Installation
Install using pip:
```
pip install distvae-tabular
```

### 2. Usage
```python
from distvae_tabular import distvae
```
```python
distvae.DistVAE # DistVAE model
distvae.generate_data # generate synthetic data
```
- See [example.ipynb](example.ipynb) for detailed example with `loan` dataset.
  - Link for download `loan` dataset: [https://www.kaggle.com/datasets/teertha/personal-loan-modeling](https://www.kaggle.com/datasets/teertha/personal-loan-modeling)

### Citation
If you use this code or package, please cite our associated paper:
```
@article{an2024distributional,
  title={Distributional learning of variational AutoEncoder: application to synthetic data generation},
  author={An, Seunghwan and Jeon, Jong-June},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  year={2024}
}
```
