Metadata-Version: 2.1
Name: hf-fastup
Version: 0.0.6
Summary: Fast upload in parallel large datasets to HuggingFace Datasets hub.
Home-page: https://github.com/kkoutini/hf-fastup
Author: Khaled Koutini
Author-email: first.last@jku.at
License: Apache-2.0
Project-URL: Bug Tracker, https://github.com/kkoutini/hf-fastup/issues
Project-URL: Source Code, https://github.com/kkoutini/hf-fastup
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Requires-Dist: datasets>=2.15.0
Requires-Dist: hf_transfer>=0.1.4

# HF-fastup

Pushes a HF dataset to the HF hub as a Parquet dataset, allowing streaming.
The dataset is processed to shards and uploaded in parallel. It useful for large datasets, for example, with embedded data.

## Usage

Make sure hf_transfer is installed and `HF_HUB_ENABLE_HF_TRANSFER` is set to `1`.

```python
import hffastup
import datasets
datasets.logging.set_verbosity_info()

# load any HF dataset
dataset = datasets.load_dataset("my_large_dataset.py")

hffastup.upload_to_hf_hub(dataset, "Org/repo") # upload to HF Hub
hffastup.push_dataset_card(dataset, "Org/repo") # Makes a dataset card and pushes it to HF Hub

```
