Metadata-Version: 2.1
Name: hydrotools.caches
Version: 0.1.5
Summary: Variety of object caching utilities for OWPHydroTools.
Author-email: "Jason A. Regina" <jason.regina@noaa.gov>
License: “Software code created by U.S. Government employees is not subject to copyright
        in the United States (17 U.S.C. §105). The United States/Department of Commerce
        reserve all rights to seek and obtain copyright protection in countries other
        than the United States for Software authored in its entirety by the Department
        of Commerce. To this end, the Department of Commerce hereby grants to Recipient
        a royalty-free, nonexclusive license to use, copy, and create derivative works
        of the Software outside of the United States.”
        
Project-URL: Homepage, https://github.com/NOAA-OWP/hydrotools
Project-URL: Documentation, https://noaa-owp.github.io/hydrotools/hydrotools.caches.html
Project-URL: Repository, https://github.com/NOAA-OWP/hydrotools/tree/main/python/caches
Project-URL: Bug Tracker, https://github.com/NOAA-OWP/hydrotools/issues
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: License :: Free To Use But Restricted
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Hydrology
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy<2; python_version < "3.10"
Requires-Dist: pandas
Requires-Dist: tables
Provides-Extra: develop
Requires-Dist: pytest; extra == "develop"

# OWPHydroTools :: Caches

This subpackage implements different methods to cache objects generated by `hydrotools` methods. See the [Caches Documentation](https://noaa-owp.github.io/hydrotools/hydrotools.caches.html) for a complete list and description of the currently available caches. To report bugs or request additional features, submit an issue through the [OWPHydroTools Issue Tracker](https://github.com/NOAA-OWP/hydrotools/issues) on GitHub.

## Installation

In accordance with the python community, we support and advise the usage of virtual environments in any workflow using python. In the following installation guide, we use python's built-in `venv` module to create a virtual environment in which the tool will be installed. Note this is just personal preference, any python virtual environment manager should work just fine (`conda`, `pipenv`, etc. ).

```bash
# Create and activate python environment, requires python >= 3.8
$ python3 -m venv venv
$ source venv/bin/activate
$ python3 -m pip install --upgrade pip

# Install caches
$ python3 -m pip install hydrotools.caches
```

### Mac Silicon Note

If you experience issues installing the `tables` library while installing
`hydrotools.caches`, you likely need to install `HDF5` and/or set the `HDF5_DIR`
environement variable prior to installation. Brew users should install
`HDF5` using `brew install hdf5` and try reinstalling `hydrotools.caches` using
`env HDF5_DIR=$(brew --prefix hdf5) pip install hydrotools.caches`. If you've 
already installed `HDF5` via brew, ignore the first step.

## Usage

The following example demonstrates how one might use `hydrotools.caches.hdf` to cache a `pandas.dataframe` generated by a long running process.

### Code
```python
from hydrotools.caches.hdf import HDFCache

import pandas as pd
from time import sleep

# Some long running process that returns a pandas.DataFrame
def long_process(cols, rows):
    sleep(1.0)
    data = {f'col_{i}' : [j for j in range(rows)] for i in range(cols)}
    return pd.DataFrame(data)

# Setup the cache with a context manager
#  Similar to setting up a pandas.HDFStore
with HDFCache(
    path='mycache.h5',
    complevel=1,
    complib='zlib',
    fletcher32=True
    ) as cache:
    # The first call runs long_process and stores the result
    df = cache.get(long_process, 'data/results', cols=10, rows=1000000)


    # The second call retrieves the result from cache without 
    #  running long_process
    df = cache.get(long_process, 'data/results', cols=10, rows=1000000)
```
