Metadata-Version: 2.1
Name: dare-datasets
Version: 1.0.1
Summary: A quick and easy way to download datasets from the DARE lab.
License: MIT
Author: MikeXydas
Author-email: mikexydas@gmail.com
Requires-Python: >=3.8,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: huggingface-hub (>=0.17.1,<0.18.0)
Requires-Dist: mo-sql-parsing (==5.58.21262)
Requires-Dist: numpy (>=1.24.3,<2.0.0)
Requires-Dist: pandas (>=2.0.2,<3.0.0)
Requires-Dist: requests (>=2.31.0,<3.0.0)
Requires-Dist: sql-metadata (>=2.9.0,<3.0.0)
Requires-Dist: sqlparse (>=0.4.4,<0.5.0)
Project-URL: Documentation, https://darelab.athenarc.gr/datasets-docs/
Description-Content-Type: text/markdown

# Darelab Datasets Docs

A quick access library of datasets used in [Darelab](https://darelab.imsi.athenarc.gr/).

## Installation

**Install:** `pip install dare-datasets`

**Documentation:** https://darelab.athenarc.gr/datasets-docs/add_dataset/

**Datasets included:**

* **QR2T Benchmark** from [MikeXydas](https://github.com/MikeXydas)
* **Iris** from [MikeXydas](https://github.com/MikeXydas)
* **Spider** from [George Katsogiannis](https://github.com/geokats) & [Anna Mitsopoulou](https://github.com/AnnaMitsopoulou)
* **ToTTo** from [MikeXydas](https://github.com/MikeXydas)
* **Wikitable** from [MikeXydas](https://github.com/MikeXydas)


## Usage

```python
from dare_datasets import QR2TBenchmark

qr2t_benchmark = QR2TBenchmark()
qr2t_data = qr2t_benchmark.get()
```

For each dataset, additional methods might exist. Check the documentation of each dataset for
more details.

## Dev Installation

For development purposes, additional libraries must be installed such as `pytest` and `mkdocs`.

Prerequisites:
* Python >=3.8
* [Poetry](https://python-poetry.org/docs/#installation)
* [PreCommit](https://pre-commit.com/#install) (Optional)

```bash
1. Clone the repository
2. poetry install
3. pre-commit install (Optional)
4. git branch new_dataset_name
```

After any contribution you should open a pull request.

## Testing

`pytest` is the testing framework used for this project. Commands:
* `pytest` - Run all tests. **Warning: This will download all datasets in a temp directory (~4GB).**
* `pytest -m "not download"` - Run all tests except the ones that download datasets (suggested during development).

