Metadata-Version: 2.1
Name: deepecho
Version: 0.1.3.dev1
Summary: Mixed-type multivariate time series modeling with generative adversarial networks.
Home-page: https://github.com/sdv-dev/DeepEcho
Author: MIT Data To AI Lab
Author-email: dailabmit@gmail.com
License: MIT license
Keywords: deepecho deepecho DeepEcho
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.6,<3.9
Description-Content-Type: text/markdown
Requires-Dist: pandas (<2,>=0.22)
Requires-Dist: numpy (<2,>=1.15.4)
Requires-Dist: torch (<2,>=1)
Requires-Dist: tqdm (<5,>=4)
Provides-Extra: dev
Requires-Dist: setuptools (<49.2) ; extra == 'dev'
Requires-Dist: bumpversion (<0.6,>=0.5.3) ; extra == 'dev'
Requires-Dist: pip (>=9.0.1) ; extra == 'dev'
Requires-Dist: watchdog (<0.11,>=0.8.3) ; extra == 'dev'
Requires-Dist: m2r (<0.3,>=0.2.0) ; extra == 'dev'
Requires-Dist: nbsphinx (<0.7,>=0.5.0) ; extra == 'dev'
Requires-Dist: Sphinx (<3,>=1.7.1) ; extra == 'dev'
Requires-Dist: sphinx-rtd-theme (<0.5,>=0.2.4) ; extra == 'dev'
Requires-Dist: autodocsumm (>=0.1.10) ; extra == 'dev'
Requires-Dist: flake8 (<4,>=3.7.7) ; extra == 'dev'
Requires-Dist: flake8-absolute-import (<2,>=1.0) ; extra == 'dev'
Requires-Dist: flake8-docstrings (<2,>=1.5.0) ; extra == 'dev'
Requires-Dist: flake8-sfs (<0.1,>=0.0.3) ; extra == 'dev'
Requires-Dist: isort (<5,>=4.3.4) ; extra == 'dev'
Requires-Dist: pylint (<3,>=2.5.3) ; extra == 'dev'
Requires-Dist: autoflake (<2,>=1.1) ; extra == 'dev'
Requires-Dist: autopep8 (<2,>=1.4.3) ; extra == 'dev'
Requires-Dist: twine (<4,>=1.10.0) ; extra == 'dev'
Requires-Dist: wheel (>=0.30.0) ; extra == 'dev'
Requires-Dist: coverage (<6,>=4.5.1) ; extra == 'dev'
Requires-Dist: tox (<4,>=2.9.1) ; extra == 'dev'
Requires-Dist: pytest (>=3.4.2) ; extra == 'dev'
Requires-Dist: pytest-cov (>=2.6.0) ; extra == 'dev'
Requires-Dist: jupyter (<2,>=1.0.0) ; extra == 'dev'
Requires-Dist: rundoc (<0.5,>=0.4.3) ; extra == 'dev'
Provides-Extra: test
Requires-Dist: pytest (>=3.4.2) ; extra == 'test'
Requires-Dist: pytest-cov (>=2.6.0) ; extra == 'test'
Requires-Dist: jupyter (<2,>=1.0.0) ; extra == 'test'
Requires-Dist: rundoc (<0.5,>=0.4.3) ; extra == 'test'

<p align="left">
<img width=20% src="https://dai.lids.mit.edu/wp-content/uploads/2018/06/Logo_DAI_highres.png" alt=“sdv-dev” />
<i>An open source project from Data to AI Lab at MIT.</i>
</p>

<p>
  <img width=65% src="docs/images/DeepEcho-Logo.png">
</p>

[![Development Status](https://img.shields.io/badge/Development%20Status-2%20--%20Pre--Alpha-yellow)](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)
[![PyPi Shield](https://img.shields.io/pypi/v/deepecho.svg)](https://pypi.python.org/pypi/deepecho)
[![Travis CI Shield](https://travis-ci.org/sdv-dev/DeepEcho.svg?branch=master)](https://travis-ci.org/sdv-dev/DeepEcho)
[![Coverage Status](https://codecov.io/gh/sdv-dev/DeepEcho/branch/master/graph/badge.svg)](https://codecov.io/gh/sdv-dev/DeepEcho)
[![Downloads](https://pepy.tech/badge/deepecho)](https://pepy.tech/project/deepecho)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/sdv-dev/DeepEcho/master?filepath=tutorials)
[![Slack](https://img.shields.io/badge/Slack%20Workspace-Join%20now!-36C5F0?logo=slack)](https://join.slack.com/t/sdv-space/shared_invite/zt-gdsfcb5w-0QQpFMVoyB2Yd6SRiMplcw)

# DeepEcho

* License: [MIT](https://github.com/sdv-dev/DeepEcho/blob/master/LICENSE)
* Development Status: [Pre-Alpha](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)
* Homepage: https://github.com/sdv-dev/DeepEcho

# Overview

**DeepEcho** is a **Synthetic Data Generation** Python library for **mixed-type**, **multivariate
time series**. It provides:

1. Multiple models based both on **classical statistical modeling** of time series and the latest
   in **Deep Learning** techniques.
2. A robust [benchmarking framework](benchmark) for evaluating these methods on multiple datasets
   and with multiple metrics.
3. Ability for **Machine Learning researchers** to submit new methods following our `model` and
   `sample` API and get evaluated.

## Try it out now!

If you want to quickly discover **DeepEcho**, simply click the button below and follow the tutorials!

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/sdv-dev/DeepEcho/master?filepath=tutorials)

## Join our Slack Workspace

If you want to be part of the SDV community to receive announcements of the latest releases,
ask questions, suggest new features or participate in the development meetings, please join
our Slack Workspace!

[![Slack](https://img.shields.io/badge/Slack%20Workspace-Join%20now!-36C5F0?logo=slack)](https://join.slack.com/t/sdv-space/shared_invite/zt-gdsfcb5w-0QQpFMVoyB2Yd6SRiMplcw)

# Install

## Requirements

**DeepEcho** has been developed and tested on [Python 3.6, 3.7 and 3.8](https://www.python.org/downloads/)

Also, although it is not strictly required, the usage of a [virtualenv](https://virtualenv.pypa.io/en/latest/)
is highly recommended in order to avoid interfering with other software installed in the system where **DeepEcho**
is run.

## Install with pip

The easiest and recommended way to install **DeepEcho** is using [pip](https://pip.pypa.io/en/stable/):

```bash
pip install deepecho
```

This will pull and install the latest stable release from [PyPi](https://pypi.org/).

If you want to install from source or contribute to the project please read the
[Contributing Guide](CONTRIBUTING.rst).


# Quickstart

In this short quickstart, we show how to learn a mixed-type multivariate time series
dataset and then generate synthetic data that resembles it.

We will start by loading the data and preparing the instance of our model.

```python3
from deepecho import PARModel
from deepecho.demo import load_demo

# Load demo data
data = load_demo()

# Define data types for all the columns
data_types = {
    'region': 'categorical',
    'day_of_week': 'categorical',
    'total_sales': 'continuous',
    'nb_customers': 'count',
}

model = PARModel(cuda=False)
```

If we want to use different settings for our model, like increasing the number
of epochs or enabling CUDA, we can pass the arguments when creating the model:

```python  # keep this as python (without the 3) to avoid using it in test-readme
model = PARModel(epochs=1024, cuda=True)
```

Notice that for smaller datasets like the one used on this demo, CUDA usage introduces
more overhead than the gains it obtains from parallelization, so the process in this
case is more efficient without CUDA, even if it is available.

Once we have created our instance, we are ready to learn the data and generate
new synthetic data that resembles it:

```python3
# Learn a model from the data
model.fit(
    data=data,
    entity_columns=['store_id'],
    context_columns=['region'],
    data_types=data_types,
    sequence_index='date'
)

# Sample new data
model.sample(num_entities=5)
```

The output will be a table with synthetic time series data with the same properties to
the demo data that we used as input.

# What's next?

For more details about **DeepEcho** and all its possibilities and features, please check and
run the [tutorials](tutorials).

If you want to see how we evaluate the performance and quality of our models, please have a
look at the [DeepEcho Benchmarking framework](benchmark) or [Explore the obtained results](
https://docs.google.com/spreadsheets/d/1Fbwj5ZjuYjvPmgUbXQR1HiXs5UZ1K3VulItbqrzH8rA/)

Also, please feel welcome to visit [our contributing guide](CONTRIBUTING.rst) in order to help
us developing new features or cool ideas!

# Related Projects

## SDV

[SDV](https://github.com/HDI-Project/SDV), for Synthetic Data Vault, is the end-user library for
synthesizing data in development under the [HDI Project](https://hdi-dai.lids.mit.edu/).
SDV allows you to easily model and sample relational datasets using DeepEcho thought a simple API.
Other features include anonymization of Personal Identifiable Information (PII) and preserving
relational integrity on sampled records.

## CTGAN

[CTGAN](https://github.com/sdv-dev/CTGAN) is a GAN based model for synthesizing tabular data.
It's also developed by the [MIT's Data to AI Lab](https://sdv-dev.github.io/) and is under
active development.


# History

## 0.1.2 (2020-09-15)

Add BasicGAN Model and additional benchmarking results.

## 0.1.1 (2020-08-15)

This release includes a few new features to make DeepEcho work on more types of datasets
as well as to making it easier to add new datasets to the benchmarking framework.

* Add `segment_size` and `sequence_index` arguments to `fit` method.
* Add `sequence_length` as an optional argument to `sample` and `sample_sequence` methods.
* Update the Dataset storage format to add `sequence_index` and versioning.
* Separate the sequence assembling process in its own `deepecho.sequences` module.
* Add function `make_dataset` to create a dataset from a dataframe and just a few column names.
* Add notebook tutorial to show how to create a datasets and use them.

## 0.1.0 (2020-08-11)

First release.

Included Features:

* PARModel
* Demo dataset and tutorials
* Benchmarking Framework
* Support and instructions for benchmarking on a Kubernetes cluster.


