Metadata-Version: 2.3
Name: torchgmm
Version: 0.1.2
Summary: Run Gaussian Mixture Models on single or multiple CPUs/GPUs
Project-URL: Documentation, https://torchgmm.readthedocs.io/
Project-URL: Source, https://github.com/CSOgroup/torchgmm
Project-URL: Home-page, https://github.com/CSOgroup/torchgmm
Author: Marco Varrone
Maintainer-email: Marco Varrone <marco.varrone@unil.ch>
License: MIT License
        
        Copyright (c) 2024, Marco Varrone
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Requires-Python: >=3.8
Requires-Dist: anndata
Requires-Dist: numpy<2.0.0,>=1.20.3
Requires-Dist: pytorch-lightning>=1.6.0
Requires-Dist: session-info
Requires-Dist: torch>1.11.0
Requires-Dist: torchmetrics>=0.6
Provides-Extra: dev
Requires-Dist: pre-commit; extra == 'dev'
Requires-Dist: twine>=4.0.2; extra == 'dev'
Provides-Extra: doc
Requires-Dist: docutils!=0.18.*,!=0.19.*,>=0.8; extra == 'doc'
Requires-Dist: ipykernel; extra == 'doc'
Requires-Dist: ipython; extra == 'doc'
Requires-Dist: myst-nb>=1.1.0; extra == 'doc'
Requires-Dist: pandas; extra == 'doc'
Requires-Dist: setuptools; extra == 'doc'
Requires-Dist: sphinx-autodoc-typehints; extra == 'doc'
Requires-Dist: sphinx-book-theme>=1.0.0; extra == 'doc'
Requires-Dist: sphinx-copybutton; extra == 'doc'
Requires-Dist: sphinx>=4; extra == 'doc'
Requires-Dist: sphinxcontrib-bibtex>=1.0.0; extra == 'doc'
Requires-Dist: sphinxext-opengraph; extra == 'doc'
Provides-Extra: test
Requires-Dist: coverage; extra == 'test'
Requires-Dist: flaky; extra == 'test'
Requires-Dist: pytest; extra == 'test'
Requires-Dist: pytest-benchmark; extra == 'test'
Requires-Dist: scikit-learn; extra == 'test'
Description-Content-Type: text/markdown

# TorchGMM

[![Tests][badge-tests]][link-tests]
[![Documentation][badge-docs]][link-docs]

[badge-tests]: https://img.shields.io/github/actions/workflow/status/CSOgroup/torchgmm/test.yaml?branch=main
[link-tests]: https://github.com/CSOgroup/torchgmm/actions/workflows/test.yml
[badge-docs]: https://img.shields.io/readthedocs/torchgmm
[link-docs]: https://torchgmm.readthedocs.io/

TorchGMM allows to run Gaussian Mixture Models on single or multiple CPUs/GPUs.
The repository is a fork from [PyCave](https://github.com/borchero/pycave) and [LightKit](https://github.com/borchero/lightkit), two amazing packages developed by [Oliver Borchert](https://github.com/borchero) that are not being maintained anymore.
While PyCave implements additional models such as Markov Chains, TorchGMM focuses only on Gaussian Mixture Models.

The models are implemented in [PyTorch](https://pytorch.org/) and [PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/), and provide an `Estimator` API
that is fully compatible with [scikit-learn](https://scikit-learn.org/stable/).

For Gaussian mixture model, TorchGMM allows for 100x speed ups when using a GPU and enables to train
on markedly larger datasets via mini-batch training. The full suite of benchmarks run to compare
TorchGMM models against scikit-learn models is available on the
[documentation website](https://pycave.borchero.com/sites/benchmark.html).

## Features

-   Support for GPU and multi-node training by implementing models in PyTorch and relying on
    [PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/)
-   Mini-batch training for all models such that they can be used on huge datasets
-   Well-structured implementation of models

    -   High-level `Estimator` API allows for easy usage such that models feel and behave like in
        scikit-learn
    -   Medium-level `LightingModule` implements the training algorithm
    -   Low-level PyTorch `Module` manages the model parameters

## Getting started

Please refer to the documentation. In particular, the [API documentation](https://pycave.borchero.com/sites/api.html)

### Requirements

TorchGMM requires PyTorch to be installed. The installation instructions can be found on the
[PyTorch website](https://pytorch.org/get-started/locally/).

TorchGMM is available via `pip`:

```bash
pip install torchgmm
```

## Usage

If you've ever used scikit-learn, you'll feel right at home when using TorchGMM. First, let's create
some artificial data to work with:

```python
import torch

X = torch.cat([
    torch.randn(10000, 8) - 5,
    torch.randn(10000, 8),
    torch.randn(10000, 8) + 5,
])
```

This dataset consists of three clusters with 8-dimensional datapoints. If you want to fit a K-Means
model, to find the clusters' centroids, it's as easy as:

```python
from torchgmm.clustering import KMeans

estimator = KMeans(3)
estimator.fit(X)

# Once the estimator is fitted, it provides various properties. One of them is
# the `model_` property which yields the PyTorch module with the fitted parameters.
print("Centroids are:")
print(estimator.model_.centroids)
```

Due to the high-level estimator API, the usage for all machine learning models is similar. The API
documentation provides more detailed information about parameters that can be passed to estimators
and which methods are available.

### GPU and Multi-Node training

For GPU- and multi-node training, TorchGMM leverages PyTorch Lightning. The hardware that training
runs on is determined by the
[Trainer](https://pytorch-lightning.readthedocs.io/en/latest/api/pytorch_lightning.trainer.trainer.html#pytorch_lightning.trainer.trainer.Trainer)
class. It's
[**init**](https://pytorch-lightning.readthedocs.io/en/latest/api/pytorch_lightning.trainer.trainer.html#pytorch_lightning.trainer.trainer.Trainer.__init__)
method provides various configuration options.

If you want to run K-Means with a GPU, you can pass the options `accelerator='gpu'` and `devices=1`
to the estimator's initializer:

```python
estimator = KMeans(3, trainer_params=dict(accelerator='gpu', devices=1))
```

Similarly, if you want to train on 4 nodes simultaneously where each node has one GPU available,
you can specify this as follows:

```python
estimator = KMeans(3, trainer_params=dict(num_nodes=4, accelerator='gpu', devices=1))
```

In fact, **you do not need to change anything else in your code**.

### Implemented Models

Currently, TorchGMM implements two different models:

-   [GaussianMixture](https://pycave.borchero.com/sites/generated/bayes/gmm/pycave.bayes.GaussianMixture.html)
-   [K-Means](https://pycave.borchero.com/sites/generated/clustering/kmeans/pycave.clustering.KMeans.html)

## Contribution

If you found a bug or you want to propose a new feature, please use the [issue tracker](https://github.com/CSOgroup/cellcharter/issues).

## License

TorchGMM is licensed under the [MIT License](https://github.com/marcovarrone/torchgmm/blob/main/LICENSE).
