Metadata-Version: 2.1
Name: sneks-sync
Version: 0.6.0
Summary: Launch a Dask cluster from a virtual environment
License: MIT
Author: Gabe Joseph
Author-email: gjoseph92@gmail.com
Requires-Python: >=3.8.4,<4.0.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: bokeh (>=2.4.2,<3)
Requires-Dist: cloudpickle (>=2.1.0,<3.0.0)
Requires-Dist: coiled (>=0.5.2,<0.6.0)
Requires-Dist: rich (>=12.4.4,<13.0.0)
Requires-Dist: setuptools (>=65.3.0)
Requires-Dist: tomli (>=2.0.1,<3.0.0)
Description-Content-Type: text/markdown

# sneks

Get your snakes in a row.

`sneks` lets you launch a [Dask cluster in the cloud](https://coiled.io/), matched to your local software environment\*, in a single line of code. No more dependency mismatches or Docker image building.

```python
from sneks import get_client

client = get_client()
```

\*your local [Poetry](https://python-poetry.org/) or [PDM](https://pdm.fming.dev/latest/) environment. You must use poetry or PDM. Locking package managers are what sensible people use, and you are sensible.

*Neat! Sneks also supports ARM clusters! Just pass ARM instances in `scheduler_instace_types=`, `worker_instace_types=` and cross your fingers that all your dependencies have cross-arch wheels!*

## Installation

```shell
poetry add -G dev sneks-sync
```

## A full example:

```shell
mkdir example && cd example
poetry init -n
poetry add -G dev sneks-sync
poetry add distributed==2022.5.2 dask==2022.5.2 bokeh pandas pyarrow  # and whatever else you want
```
```python
from sneks import get_client
import dask.dataframe as dd

client = get_client(name="on-a-plane")
ddf = dd.read_parquet(
    "s3://nyc-tlc/trip data/yellow_tripdata_2012-*.parquet",
)
print(ddf.groupby('passenger_count').trip_distance.mean().compute())
```

Oh wait, we forgot to install a dependency!
```shell
poetry add foobar
```

When we reconnect to the cluster (using the same name), the dependencies on the cluster update automatically.
```python
from sneks import get_client
import dask.dataframe as dd
import foobar  # ah, how could we forget this critical tool

client = get_client(name="on-a-plane")
ddf = dd.read_csv(
    "s3://nyc-tlc/csv_backup/yellow_tripdata_2012-*.csv",
)
means = ddf.groupby('passenger_count').trip_distance.mean()
means.apply(foobar.optimize).compute()

```

## Caveats

This is still a proof-of-concept-level package. It's been used personally quite a bit, and proven reliable, but use at your own risk.
