Metadata-Version: 2.1
Name: whisk
Version: 0.1.21
Summary: Machine Learning Project Framework Generator
Home-page: https://github.com/bookletai/whisk
Author: Derek Haynes
Author-email: derek@dlite.cc
License: MIT license
Keywords: whisk
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: Click (>=7.0)
Requires-Dist: cookiecutter (>=1.7.2)

# Whisk ML Project Framework

[![pypi](https://img.shields.io/pypi/v/whisk.svg)](https://pypi.python.org/pypi/whisk)

[![docs](https://readthedocs.org/projects/whisk/badge/?version=latest)](https://whisk.readthedocs.io/en/latest/?badge=latest)

Tying together the tools required to release a machine learning model can be daunting. Whisk makes building and releasing ML models easy and fun. Whisk creates a logical and flexible project structure for ML that creates reproducible results and lets you release your model to the world without becoming a software engineer.

Whisk doesn't lock you into a particular ML framework or require you to learn yet another ML packaging API. Instead, it leverages the magic of Python's ecosystem that's available to projects structured in a Pythonic-way. Whisk does the structuring while you focus on the data science.

Read more about our [beliefs](#beliefs).

## Quickstart

_Replace `demo` with the name of your ML project in the examples below._

Create the project:

```
pip install whisk
echo "Generate the directory structure, set up a venv, initialize a Git repo, and more."
whisk create demo
cd demo
source venv/bin/activate
```

Take a quick tour the project you just created:

```
jupyter-notebook notebooks/getting_stated.ipynb
```

The notebook shows how to save your trained model to disk, use the saved model to generate predictions, how to load Python functions and classes from the project's `src` directory for a cleaner notebook, and more. It's the guide rails for your own ML project.

There's a placeholder model you can invoke immediately from the command line:

```
$ demo predict [[0,1],[2,3]]
[2, 2]
```

...and a ready-to-go Flask web service:

```
whisk app start
curl --location --request POST 'http://localhost:5000/predict' \
--header 'Content-Type: application/json' \
--data-raw '{"data":[[0, 1], [2, 3]]}'
```

Deploy the web service to Heroku (a free account is fine):

```
whisk app create demo-[INSERT YOUR NAME]
```

Create a Python package containing your model and share with the world:

```
$ whisk package dist
Python Package created in /Users/dlite/projects/whisk_examples/demo/dist:
demo-0.1.0.tar.gz

pip install dist/demo-0.1.0.tar.gz
```

Invoke the model via the CLI:

```
demo predict [[0,1],[2,3]]
```

...and within Python code:

```py
from demo import model
model.predict([[0,1],[2,3]])
```

## Whisk CLI Commands

To see a list of available whisk commands and command groups:

    whisk --help

You can view help on specific command groups like this:

    whisk app --help

## Beliefs

* _A notebook isn't enough_ - A data science notebook is where experimentation starts, but you can't create a reproducible, collaborative ML project with just a `*.ipynb` file.
* _A Reproducible, collaborative project is a solved problem for classical software_ - We don't need to re-invent the wheel for machine learning projects. Instead, we need guide rails to help data scientists structure projects without forcing them to also become software engineers.
* _Python already has a good package manager_ - We don't need overly abstracted solutions to package a trained ML model. A properly structured ML project makes it easy to use _pip_ for packaging a model, making it easy for _anyone_ to benefit from your work.
* _Version control is a requirement_ - You can't have a reproducible project if the code and training data isn't in version control.
* _Docker is a heavyweight and fragile option for solving reproducibility_ - when we [explicitly declare and isolate dependencies](https://12factor.net/dependencies), we don't need to rely on the implicit existence of packages installed in a Docker container. Docker also creates a slow development flow: repeatedly restarting Docker containers is far slower than doing the same in pure Python. Python already has solid native tools for this problem.
* _Optimize for debugging_ - 90% of writing software is fixing bugs. It should be easy to debug your model logic locally.


## Credits

This package was created with Cookiecutter and the `audreyr/cookiecutter-pypackage` project template. The project template is heavily inspired by [Cookiecutter Data Science](https://github.com/drivendata/cookiecutter-data-science).


