Metadata-Version: 2.1
Name: zookeeper
Version: 0.2.0
Summary: A small library for managing deep learning models, hyper parameters and datasets
Home-page: https://github.com/plumerai/zookeeper
Author: Plumerai
Author-email: lukas@plumerai.co.uk
License: Apache 2.0
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development
Description-Content-Type: text/markdown
Requires-Dist: click (>=7.0)
Requires-Dist: click-completion (>=0.5.1)
Requires-Dist: matplotlib (>=3.0.3)
Requires-Dist: tensorflow-datasets (>=1.0.2)
Provides-Extra: tensorflow
Requires-Dist: tensorflow (>=1.13.1) ; extra == 'tensorflow'
Provides-Extra: tensorflow_gpu
Requires-Dist: tensorflow-gpu (>=1.13.1) ; extra == 'tensorflow_gpu'
Provides-Extra: test
Requires-Dist: pytest (>=4.3.1) ; extra == 'test'
Requires-Dist: pytest-cov (>=2.6.1) ; extra == 'test'

# Zookeeper

[![Azure DevOps builds](https://img.shields.io/azure-devops/build/plumerai/zookeeper/11.svg?logo=azure-devops)](https://plumerai.visualstudio.com/zookeeper/_build/latest?definitionId=11&branchName=master) [![Azure DevOps coverage](https://img.shields.io/azure-devops/coverage/plumerai/zookeeper/11.svg?logo=azure-devops)](https://plumerai.visualstudio.com/zookeeper/_build/latest?definitionId=11&branchName=master) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/zookeeper.svg)](https://pypi.org/project/zookeeper/) [![PyPI](https://img.shields.io/pypi/v/zookeeper.svg)](https://pypi.org/project/zookeeper/) [![PyPI - License](https://img.shields.io/pypi/l/zookeeper.svg)](https://github.com/plumerai/zookeeper/blob/master/LICENSE) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black) [![Join the community on Spectrum](https://withspectrum.github.io/badge/badge.svg)](https://spectrum.chat/larq)

A small library for managing deep learning models, hyper parameters and datasets designed to make training deep learning models easy and reproducible.

## Getting Started

Zookeeper allows you to build command line interfaces for training deep learning models with very little boiler plate using [click](https://click.palletsprojects.com/) and [TensorFlow Datasets](https://www.tensorflow.org/datasets/). It helps you structure your machine learning projects in a framework agnostic and effective way.
Zookeeper is heavily inspired by [Tensor2Tensor](https://github.com/tensorflow/tensor2tensor) and [Fairseq](https://github.com/pytorch/fairseq/) but is designed to be used as a library making it lightweight and very flexible. Currently zookeeper is limited to image classification tasks but we are working on making it useful for other tasks as well.

### Installation

```console
pip install zookeeper
pip install colorama  # optional for colored console output
```

### Registry

Zookeeper keeps track of data preprocessing, models and hyperparameters to allow you to reference them by name from the commandline.

#### Datasets and Preprocessing

TensorFlow Datasets provides [many popular datasets](https://www.tensorflow.org/datasets/datasets) that can be downloaded automatically.
In the following we will use [MNIST](http://yann.lecun.com/exdb/mnist) and define a `default` preprocessing for the images that scales the image to `[0, 1]`:

```python
import tensorflow as tf

from zookeeper import cli, build_train, HParams, registry

@registry.register_preprocess("mnist")
def default(image, training=False):
    return tf.cast(image, dtype=tf.float32) / 255
```

#### Models

Next we will register a model called `cnn`. We will use the [Keras API](https://keras.io) for this:

```python
@registry.register_model
def cnn(hp, dataset):
    return tf.keras.models.Sequential(
        [
            tf.keras.layers.Conv2D(
                hp.filters[0],
                (3, 3),
                activation=hp.activation,
                input_shape=dataset.input_shape,
            ),
            tf.keras.layers.MaxPooling2D((2, 2)),
            tf.keras.layers.Conv2D(hp.filters[1], (3, 3), activation=hp.activation),
            tf.keras.layers.MaxPooling2D((2, 2)),
            tf.keras.layers.Conv2D(hp.filters[2], (3, 3), activation=hp.activation),
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(hp.filters[3], activation=hp.activation),
            tf.keras.layers.Dense(dataset.num_classes, activation="softmax"),
        ]
    )
```

#### Hyperparameters

For each model we can register one or more hyperparameters sets that will be passed to the model function when called:

```python
@registry.register_hparams(cnn)
class basic(HParams):
    activation = "relu"
    batch_size = 32
    filters = [64, 64, 64, 64]
    learning_rate = 1e-3

    @property
    def optimizer(self):
        return tf.keras.optimizers.Adam(self.learning_rate)
```

### Training loop

To train the models registered above we will need to write a custom training loop. Zookeeper will then tie everything together:

```python
@cli.command()
@build_train()
def train(build_model, dataset, hparams, output_dir, epochs):
    """Start model training."""
    model = build_model(hparams, dataset)
    model.compile(
        optimizer=hparams.optimizer,
        loss="categorical_crossentropy",
        metrics=["categorical_accuracy", "top_k_categorical_accuracy"],
    )

    model.fit(
        dataset.train_data(hparams.batch_size),
        epochs=epochs,
        steps_per_epoch=dataset.train_examples // hparams.batch_size,
        validation_data=dataset.validation_data(hparams.batch_size),
        validation_steps=dataset.validation_examples // hparams.batch_size,
    )
```

This will register Click command called `train` which can be executed from the command line.

### Command Line Interface

To make the file we just created executable we will add the following lines at the bottom:

```python
if __name__ == "__main__":
    cli()
```

If you want to register your models in separate files, make sure to import them before calling `cli` to allow zookeeper to properly register them. To install your CLI as a executable command checkout the [`setuptools` integration](http://click.palletsprojects.com/en/7.x/setuptools/) of Click.

#### Usage

Zookeeper already ships with `prepare`, `plot`, and `tensorboard` commands, but now also includes the `train` command we created above:

```console
python examples/train.py --help
```

```console
Usage: train.py [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  install-completion  Install shell completion.
  plot                Plot data examples.
  prepare             Downloads and prepares datasets for reading.
  tensorboard         Start TensorBoard to monitor model training.
  train               Start model training.
```

To train the model we just registered run:

```console
python examples/train.py train cnn --dataset mnist --epochs 10 --hparams-set basic --hparams batch_size=64
```

Multiple arguments are seperated by a comma, and strings should be passed without quotion marks:

```console
python examples/train.py train cnn --dataset mnist --epochs 10 --hparams-set basic --hparams batch_size=32,actvation=relu
```


