Metadata-Version: 2.1
Name: dataset-manager
Version: 0.0.4
Summary: Manage and automatize datasets for data science projects.
Home-page: UNKNOWN
Author: Diogo Munaro Vieira
Author-email: diogo.mvieira@gmail.com
License: Apache 2
Platform: UNKNOWN
Description-Content-Type: text/markdown
Requires-Dist: PyYAML (==5.1)
Requires-Dist: pandas (>=0.19.2)

# Dataset Manager

Manage and automatize your datasets for your project with YAML files.

Create a file *name.yaml* with content in your dataset directory:

```
name: your_dataset_name

src: https://raw.githubusercontent.com/pcsanwald/kaggle-titanic/master/train.csv

description: this dataset is a test dataset

format: csv
```

*name*: is the name for dataset reference.

*src*: is location from dataset.

*description*: describe your dataset to remember later.

*format*: pandas read format following `read_<format>` as described here: https://pandas.pydata.org/pandas-docs/stable/reference/io.html.

Each dataset is a YAML file inside dataset directory.

## List all Datasets

```
from dataset_manager import DatasetManager

manager = DatasetManager(dataset_path)

manager.list_datasets() ## return a List with all datasets from dataset path
```

## Get one Dataset

```
from dataset_manager import DatasetManager

manager = DatasetManager(dataset_path)

manager.get_dataset(name) ## Get dataset as Pandas DataFrame
```

