Metadata-Version: 2.1
Name: neoval-py-utils
Version: 0.3.0
Summary: 
Author: Neoval
Author-email: data@neoval.io
Requires-Python: >=3.10,<=3.12
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: duckdb (>=0.9.2,<0.10.0)
Requires-Dist: google-cloud-bigquery
Requires-Dist: google-cloud-storage
Requires-Dist: jinja2 (>=3.1.2,<4.0.0)
Requires-Dist: polars
Requires-Dist: pyarrow
Requires-Dist: pyyaml (>=6.0.1,<7.0.0)
Requires-Dist: taskipy (>=1.12.2,<2.0.0)
Requires-Dist: typer (>=0.9.0,<0.10.0)
Description-Content-Type: text/markdown

# neoval-py-utils

Python Utilities

# Development

All development must take place on a feature branch and a pull request is required; a user is not allowed to commit directly to `main`. The automated workflow in this repo (using `python-semantic-release`) requires the use of [angular style](https://github.com/angular/angular.js/blob/master/DEVELOPERS.md#commits) commit messages to update the package version and `CHANGELOG`. All commits must be formatted in this way before a user is able to merge a PR; a user who may want to develop without using this format for all commits can simply squash non-angular commit messages prior to merge. A PR may only be merged by the `rebase and merge` method. This is to ensure that only angular style commits end up on `main`.

Upon merge to `main`, the `deploy` workflow will facilitate the following:

- bump the version in `pyproject.toml`
- update the `CHANGELOG` using all commits added
- tag and release, if required
- publish to PyPi


## Getting Started

### Prerequisites
TODO

### Tests

For the integration tests to pass you will need to be authenticated with a Google project. With storage admin
and bigquery job permissions.

You can auth with `GOOGLE_APPLICATION_CREDENTIALS` as an environment variable or by 
running `gcloud auth application-default login`.

Specify gcp project with `gcloud config set project <project-id>`.

Run unit and integration tests with `poetry run task test`.

To run with coverage tests with `poetry run task test-with-coverage`.

# Usage

## TODO installation with pipy

Assuming that installed `neoval-py-utils`is successfully as a dependency and have permissions to gcp storage and bigquery.

## Examples of usage

### Export BQ datasets or Queries >> Dataframe or GCS #######

```python
from neoval_py_utils.exporter import Exporter
# To query a bigquery table and return a polar dataframe. Caches results, keeps for default 12 hours.
exporter = Exporter() # To use cache, pass path to the constructor. Eg Exporter(cache_dir=./cache)
pl_df = exporter.export("SELECT word FROM `bigquery-public-data.samples.shakespeare` GROUP BY word ORDER BY word DESC LIMIT 3")

# `export` is aliased by `<` operator. Will give same results as above.
pl_df = exporter < "SELECT word FROM `bigquery-public-data.samples.shakespeare` GROUP BY word ORDER BY word DESC LIMIT 3"


# To export a whole table
al_pl_df = exporter.export("bigquery-public-data.samples.shakespeare")


# To export bigquery table to a parquet file in a gcp storage bucket. Returns a list of blobs.
blobs = exporter.bq_to_gcs("my-dataset.my-table")
```
### Create In-process(Embedded) Databases #######

```shell
# Pythong cli example to build in-process db
poetry run python ipdb build <DBT_DATASET> <GCLOUD_PROJECT_ID> <DB_PATH> <CONFIG_PATH> --upload-bucket <UPLOAD_BUCKET> 
# If you would like to run it in locally in this repo, you can run
# Upload bucket is optional, this will upload the in-process db to the specified bucket.
poetry run python neoval_py_utils/ipdb.py build samples bigquery-public-data tests/artifacts/in_process_db tests/resources/good.config.yaml

# To apply sql templates after the in-process db is built
poetry run python ipdb prepare <DBT_DATASET> <GCLOUD_PROJECT_ID> <DB_PATH> <TEMPLATES_PATH>
# If you would like to run it in locally in this repo, you can run
poetry run python neoval_py_utils/ipdb.py samples bigquery-public-data tests/artifacts/in_process_db tests/resources/templates
# For more info you can run
poetry run python neoval_py_utils/ipdb.py --help # which will return 
                                                                                                                                     
 Usage: ipdb.py [OPTIONS] COMMAND [ARGS]...                                                                                                                                                               
                                                                                                                                                                                                          
╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --install-completion          Install completion for the current shell.                                                                                                                                │
│ --show-completion             Show completion for the current shell, to copy it or customize the installation.                                                                                         │
│ --help                        Show this message and exit.                                                                                                                                              │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ build                           Build the in process database(s).                                                                                                                                      │
│ make-config                     Prints a default configuration to be used with the build command.                                                                                                      │
│ prepare                         Run scripts to add views/virtual tables/etc. to the database(s).                                                                                                       │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
```


