Metadata-Version: 2.1
Name: mads-datasets
Version: 0.3.1
Summary: Datasets for the master applied data science
Author-Email: Raoul Grouls <Raoul.Grouls@han.nl>
License: MIT
Project-URL: Github, https://github.com/raoulg/mads_datasets
Requires-Python: >=3.8
Requires-Dist: tqdm>=4.65.0
Requires-Dist: requests>=2.31.0
Requires-Dist: loguru>=0.7.0
Requires-Dist: numpy>=1.24.3
Requires-Dist: pydantic>=1.10.8
Requires-Dist: pillow>=9.5.0
Requires-Dist: pandas>=2.0.3
Requires-Dist: keyring>=24.2.0
Requires-Dist: polars>=0.18.15
Requires-Dist: torch>=2.0.1; extra == "torch"
Provides-Extra: torch
Description-Content-Type: text/markdown

# MADS Datasets Library

This library provides the functionality to download, process, and stream several datasets.

## Installation
This library has been published on PyPi and can be installed with pip or poetry.

```bash
# Install with pip
pip install mads_datasets

# Install with poetry
poetry add mads_datasets
```

## Data Types
Currently, it supports the following datasets:
* SUNSPOTS Time-Series data
* IMDB Text data
* FLOWERS Image data
* FASHION MNIST Image data
* GESTURES Time-Series data
* IRIS dataset

## Usage

After installation, import the necessary components:

```python
from mads_datasets import DatasetFactoryProvider, DatasetType
```

You can create a specific dataset factory using the `DatasetFactoryProvider`.

For instance, to create a factory for the Fashion MNIST dataset:

```python
fashion_factory = DatasetFactoryProvider.create_factory(DatasetType.FASHION)
```

With the factory, you can download the data, create datasets and provide the datasets wrapped in datastreamers in one command:

```python
streamers = mnistfactory.create_datastreamer(batchsize=32)
train = streamers["train"]
X, y = next(train.stream())
```

The train.stream() command wil return a generator that will yield batches of data.

You could also create a dataset directly:

```python
dataset = fashion_factory.create_dataset()
```

Or download the data:

```python
fashion_factory.download_data()
```
