Metadata-Version: 2.1
Name: mara-storage
Version: 1.0.0
Summary: Configuration of storage connections for mara
Home-page: https://github.com/mara/mara-storage
Author: Mara contributors
License: MIT
Project-URL: Source Code, https://github.com/mara/mara-storage
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: MIT License
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Provides-Extra: test
Provides-Extra: google-cloud-storage
License-File: LICENSE

Mara Storage
============

[![PyPI - License](https://img.shields.io/pypi/l/mara-storage.svg)](https://github.com/mara/mara-storage/blob/master/LICENSE)
[![PyPI version](https://badge.fury.io/py/mara-storage.svg)](https://badge.fury.io/py/mara-storage)
[![Slack Status](https://img.shields.io/badge/slack-join_chat-white.svg?logo=slack&style=social)](https://communityinviter.com/apps/mara-users/public-invite)

Mini package for configuring and accessing multiple storages in a single project. Decouples the use of storages and their configuration by using "aliases" for storages.

The file [mara_storage/storages.py](mara_storage/storages.py) contains abstract storage configurations for local disk and cloud storages. The storage connections of a project are configured by overwriting the `storages` function in mara_storage/config.py:

``` python
import pathlib
import mara_storage.config
import mara_storage.storages

## configure storage connections for different aliases
mara_storage.config.storages = lambda: {
    'data': mara_storage.storages.LocalStorage(base_path=pathlib.Path('data')),
    'gcs-bucket-1': mara_storage.storages.GoogleCloudStorage(bucket_name='my_data_lake_bucket_1', project_id='my_awesome_project')
}

## access individual storage configurations with `storages.storage`:
print(mara_storage.storages.storage('data'))
# -> <LocalStorage: base_path=data>
```

This packages gives the possibility to configure, manage and access multile storages in mara.

&nbsp;


## Batch processing: Accessing storages with shell commands

The file [mara_storage/shell.py](mara_storage/shell.py) contains functions that create commands for accessing storage files via their command line clients.
   
For example, the `read_file_command` function creates a shell command that reads a file from a storage and returns its content to stdout:

```python
import mara_storage.shell

file = 'my_domain.com/logs/2020/11/15/nginx.node-1.error.log'

print(mara_storage.shell.read_file_command('data', file_name=file))
# -> cat /mara/data/my_domain.com/logs/2020/11/15/nginx.node-1.error.log

print(mara_storage.shell.read_file_command('gcs-bucket-1', file_name=file))
# -> gsutil cat gs://my_data_lake_bucket_1/my_domain.com/logs/2020/11/15/nginx.node-1.error.log
```

The function `write_file_command` creates a shell command that receives a data on stdin and writes it to the storage:

```python
import mara_storage.shell

command = 'echo "Hello World!"'
command += ' | '
command += mara_storage.shell.write_file_command('data', file_name='hello-world.txt')

print(command)
# -> echo "Hello World!" | cat - > /mara/data/hello-world.txt
```

Finally, `delete_file_command` creates a shell command that deletes a file from the local storage:

```python
import mara_storage.shell

print(mara_storage.shell.delete_file_command('data', file_name='hello-world.txt'))
# -> rm -f /mara/data/hello-world.txt
```

&nbsp;


The following **command line clients** are used to access the various databases:

| Database | Client binary | Comments |  
| --- | --- | --- |
| Local storage | unix shell | Included in standard distributions. |
| Google Cloud Storage | `gsutil` | From [https://cloud.google.com/storage/docs/gsutil_install](https://cloud.google.com/storage/docs/gsutil_install). |

&nbsp;


Installation
------------
To use the library directly, use pip:

```bash
pip install mara-storage
```

or

```
pip install git+https://github.com/mara/mara-storage.git
```


