Metadata-Version: 2.1
Name: flwr-datasets
Version: 0.0.2
Summary: Flower Datasets
Home-page: https://flower.dev
License: Apache-2.0
Keywords: flower,fl,federated learning,federated analytics,federated evaluation,machine learning,dataset
Author: The Flower Authors
Author-email: hello@flower.dev
Requires-Python: >=3.8,<4.0
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Provides-Extra: audio
Provides-Extra: vision
Requires-Dist: datasets (>=2.14.3,<3.0.0)
Requires-Dist: librosa (>=0.10.0.post2) ; extra == "audio"
Requires-Dist: numpy (>=1.21.0,<2.0.0)
Requires-Dist: pillow (>=6.2.1) ; extra == "vision"
Requires-Dist: soundfile (>=0.12.1) ; extra == "audio"
Project-URL: Documentation, https://flower.dev/docs/datasets
Project-URL: Repository, https://github.com/adap/flower
Description-Content-Type: text/markdown

# Flower Datasets

[![GitHub license](https://img.shields.io/github/license/adap/flower)](https://github.com/adap/flower/blob/main/LICENSE)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/adap/flower/blob/main/CONTRIBUTING.md)
![Build](https://github.com/adap/flower/actions/workflows/framework.yml/badge.svg)
![Downloads](https://pepy.tech/badge/flwr-datasets)
[![Slack](https://img.shields.io/badge/Chat-Slack-red)](https://flower.dev/join-slack)

Flower Datasets (`flwr-datasets`) is a library to quickly and easily create datasets for federated learning, federated evaluation, and federated analytics. It was created by the `Flower Labs` team that also created Flower: A Friendly Federated Learning Framework. 
Flower Datasets library supports:
* **downloading datasets** - choose the dataset from Hugging Face's `datasets`,
* **partitioning datasets** - customize the partitioning scheme,
* **creating centralized datasets** - leave parts of the dataset unpartitioned (e.g. for centralized evaluation).

Thanks to using Hugging Face's `datasets` used under the hood, Flower Datasets integrates with the following popular formats/frameworks:
* Hugging Face,
* PyTorch, 
* TensorFlow, 
* Numpy, 
* Pandas, 
* Jax,
* Arrow.

Create **custom partitioning schemes** or choose from the **implemented partitioning schemes**:
* Partitioner (the abstract base class) `Partitioner`
* IID partitioning `IidPartitioner(num_partitions)`
* Natural ID partitioner `NaturalIdPartitioner`
* Size partitioner (the abstract base class for the partitioners dictating the division based the number of samples) `SizePartitioner` 
* Linear partitioner `LinearPartitioner`
* Square partitioner `SquarePartitioner`
* Exponential partitioner `ExponentialPartitioner`
* more to come in future releases.

# Installation

## With pip

Flower Datasets can be installed from PyPi

```bash
pip install flwr-datasets
```

Install with an extension:

* for image datasets:

```bash
pip install flwr-datasets[vision]
```

* for audio datasets:

```bash
pip install flwr-datasets[audio]
```

If you plan to change the type of the dataset to run the code with your ML framework, make sure to have it installed too.

# Usage

Flower Datasets exposes the `FederatedDataset` abstraction to represent the dataset needed for federated learning/evaluation/analytics. It has two powerful methods that let you handle the dataset preprocessing: `load_partition(node_id, split)` and `load_full(split)`.

Here's a basic quickstart example of how to partition the MNIST dataset:

```
from flwr_datasets import FederatedDataset

# The train split of the MNIST dataset will be partitioned into 100 partitions
mnist_fds = FederatedDataset("mnist", partitioners={"train": 100}

mnist_partition_0 = mnist_fds.load_partition(0, "train")

centralized_data = mnist_fds.load_full("test")
```

For more details, please refer to the specific how-to guides or tutorial. They showcase customization and more advanced features.

# Future release

Here are a few of the things that we will work on in future releases:

* ✅ Support for more datasets (especially the ones that have user id present).
* ✅ Creation of custom `Partitioner`s.
* ✅ More out-of-the-box `Partitioner`s.
* ✅ Passing `Partitioner`s via `FederatedDataset`'s `partitioners` argument. 
* ✅ Customization of the dataset splitting before the partitioning.
* Simplification of the dataset transformation to the popular frameworks/types.
* Creation of the synthetic data,
* Support for Vertical FL.

