Metadata-Version: 2.1
Name: pyreadstore
Version: 1.0.0
Summary: PyReadStore is the Python client (SDK) for the ReadStore API
Home-page: https://github.com/EvobyteDigitalBiology/pyreadstore
Author: Jonathan Alles
Author-email: Jonathan.Alles@evo-byte.com
License: Apache-2.0 license
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: Unix
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests >=2.32.3
Requires-Dist: pydantic >=2.9
Requires-Dist: pandas >=2.2

# PyReadStore SDK

This README describes PyReadStore, the Python client (SDK) for the ReadStore API. 

PyReadStore can be used to access projects, datasets, metadata and attachment files in the ReadStore Database from  Python code. 
The package enables you to automate your bioinformatics pipelines, Python scripts and notebooks.

Check the [ReadStore Github repository](https://github.com/EvobyteDigitalBiology/readstore) for more information on how to get started with ReadStore and setting up your server.

More infos on the [ReadStore website](#https://evo-byte.com/readstore/)

Tutorials and Intro Videos: https://www.youtube.com/@evobytedigitalbio

Blog posts and How-Tos: https://evo-byte.com/blog/

For general questions reach out to info@evo-byte.com

Happy analysis :)


## Table of Contents
- [Description](#description)
- [Installation](#installation)
- [Usage](#usage)
- [Contributing](#contributing)
- [License](#license)
- [Credits and Acknowledgments](#acknowledgments)

## The Lean Solution for Managing FASTQ and NGS Data

ReadStore is a platform for storing, managing, and integrating genomic data. It speeds up analysis and offers a simple way of managing and sharing FASTQ and NGS datasets.
Built-in project and metadata management structures your workflows and a collaborative user interface enhances teamwork — so you can focus on generating insights.

The integrated Webservice enables your to directly retrieve data from ReadStore via the terminal Command-Line-Interface (CLI) or Python/R SDKs.

The ReadStore Basic version provides a local webserver with a simple user management. If you need an organization-wide deployment, advanced user and group management or cloud integration please check the ReadStore Advanced versions and reach out to info@evo-byte.com.

## Description

PyReadStore is a Python client (SDK) that lets you easily connect to your ReadStore server and interact with the ReadStore API.
By importing the pyreadstore package in Python, you can quickly retrieve data from a ReadStore server.

This tool provides streamlined and standardized access to NGS datasets and metadata, helping you run analyses more efficiently and with fewer errors.
You can easily scale your pipelines, and if you need to migrate or move NGS data, updating the ReadStore database ensures all your workflows stay up-to-date.


## Security and Permissions<a id="backup"></a>

**PLEASE READ AND FOLLOW THESE INSTRUCTIONS CAREFULLY!**

### User Accounts and Token<a id="token"></a>

Using PyReadStore requires an active user account and a token (and a running ReadStore server). 

You should **never enter your user account password** when working with PyReadStore.

To retrieve your token:

1. Login to the ReadStore app via your browser
2. Navigate to `Settings` page and click on `Token`
3. You can regenerate your token anytime (`Reset`). This will invalidate the previous token

For uploading FASTQ files your user account needs to have `Staging Permission`.
You can check this in the `Settings` page of your account.
If you not have `Staging Permission`, ask your ReadStore server admin to grant you permission.

### Setting Your Credentials

You need to provide the PyReadStore client with valid ReadStore credentials.

There are different options

1. Load credentials from the ReadStore `config` file. 
The file is generated by the [ReadStore CLI](https://github.com/EvobyteDigitalBiology/readstore-cli),
by default in your home directory (`~/.readstore/`). Make sure to keep read permissions to the file restrictive

2. Directly enter your username and token when instantiating a PyReadStore client within your Python code

3. Set username and token via environment variables (`READSTORE_USERNAME`, `READSTORE_TOKEN`). This is useful in container or cloud environments.


## Installation

`pip3 install pyreadstore`

You can perform the install in a conda or venv virtual environment to simplify package management.

A local install is also possible

`pip3 install --user pyreadstore`

Validate the install with a module import

```python 
import pyreadstore
```

## Usage

Detailed tutorials, videos and explanations are found on [YouTube](https://www.youtube.com/playlist?list=PLk-WMGySW9ySUfZU25NyA5YgzmHQ7yquv) or on the [**EVO**BYTE blog](https://evo-byte.com/blog).

### Quickstart

Let's access some dataset and project data from the ReadStore database!

Make sure a ReadStore server is running and reachable (by default under `127.0.0.1:8000`).
You can enter (`http://127.0.0.1:8000/api_v1/`) in your browser and should get a response from the API.

We assume you ran `readstore configure` before to create a config file for your user.
If not, consult the [ReadStore CLI](https://github.com/EvobyteDigitalBiology/readstore-cli) README on how to set this up.

We will create a client instance and perform some operations to retrieve data from the ReadStore database.
More information on all available methods can be found below.


```python 
import pyreadstore

rs_client = pyreadstore.Client() # Create an instance of the ReadStore client

datasets = rs_client.list()      # List all datasets and return pandas dataframe

datasets_project_1 = rs_client.list(project_id = 1) # List all datasets for project 1

datasets_id_25 = rs_client.get(dataset_id = 25)     # Get detailed data for dataset 25

projects = rs_client.list_projects()                # List all projects

projects = rs_client.get_project(project_name = 'MyProject') # Get details for MyProject

fastq_data_id_25 = rs_client.get_fastq(dataset_id = 25)     # Get fastq file data for dataset 25

rs_client.download_attachment(dataset_id = 25,              # Download files attached to dataset 25
                              attachment_name = 'gene_table.tsv') 

rs_client.upload_fastq(fastq = 'path/to_fastq_r1.fq')       # Upload a FASTQ file
```




### Configure the Clients

The Client is the central object and provides authentication against the ReadStore API.
By default, the client will try to read the `~/.readstore/config` credentials file.
You can change the directory if your config file is located in another folder.

If you set the `username` and `token` arguments, the client will use these credentials instead.

If your ReadStore server is not running under localhost (`127.0.0.1`) port `8000`, you can adapt the default settings.

```python 
pyreadstore.Client(config_dir: str = '~/.readstore',  # Directory containing ReadStore credentials
                  username: str | None = None,        # Username
                  token : str | None = None,          # Token
                  host: str = 'http://localhost',     # Hostname / IP of ReadStore server
                  return_type: str = 'pandas',        # Default return types, can be pandas or json
                  port: int = 8000,                   # Server Port Number
                  fastq_extensions: List[str] = ['.fastq','.fastq.gz','.fq','.fq.gz']) 
                  # Accepted FASTQ file extensions for upload validation 
```

Is is possible to set userame, token, server endpoint and fastq extensions using the listed environment variables. 
The enironment variables precede over other client configurations.

- `READSTORE_USERNAME` (username)
- `READSTORE_TOKEN` (token)
- `READSTORE_ENDPOINT_URL` (`http://host:post`, e.g. `http://localhost:8000`)
- `READSTORE_FASTQ_EXTENSIONS` (fastq_extensions, `'.fastq',.fastq.gz,.fq,.fq.gz'`)


### Access Datasets

```python 
# List ReadStore Datasets

rs_client.list(project_id: int | None = None,   # Filter datasets for project with id `project_id`
              project_name: str | None = None,  # Filter datasets for project with name `project_name`
               return_type: str | None = None   # Return pd.DataFrame or JSON type
               ) -> pd.DataFrame | List[dict]

# Get ReadStore Dataset Details
# Provide dataset_id OR dataset_name

rs_client.get(dataset_id: int| None = None,     # Get dataset with id `dataset_id`
              dataset_name: str | None = None,  # Filter datasets with name `dataset_name`
              return_type: str | None = None    # Return pd.Series or json(dict)
              ) -> pd.Series | dict

# Get FASTQ file data for a dataset
# Provide dataset_id OR dataset_name

rs_client.get_fastq(dataset_id: int| None = None,    # Get fastq data for dataset with id `dataset_id`
                  dataset_name: str | None = None,   # Get fastq data for dataset `dataset_name`
                  return_type: str | None = None     # Return pd.Series or json(dict)
                  ) -> pd.DataFrame | List[dict]
```


### Access Projects

```python 
# List ReadStore Projects

rs_client.list_projects(return_type: str | None = None   # Return pd.DataFrame or JSON type
                        ) -> pd.DataFrame | List[dict]

# Get ReadStore Project Details
# Provide project_id OR project_name

rs_client.get_project(project_id: int| None = None,     # Get dataset with id `project_id`
                      project_name: str | None = None,  # Filter datasets with name `project_name`
                      return_type: str | None = None    # Return pd.Series or json(dict)
                      ) -> pd.Series | dict
```

### Download Attachmeents

```python 
# Download project attachment file from ReadStore Database 

rs_client.download_project_attachment(attachment_name: str,            # name of attachment file
                                      project_id: int | None = None,   # project id with attachment
                                      project_name: str | None = None, # project name with attachment
                                      outpath: str | None = None)      # Path to download file to

# Download dataset attachment file from ReadStore Database 

rs_client.download_attachment(attachment_name: str,             # name of attachment file
                              dataset_id: int | None = None,    # datatset id with attachment
                              dataset_name: str | None = None,  # datatset name with attachment
                              outpath: str | None = None)       # Path to download file to
```

### Upload FASTQ files

Upload FASTQ files to ReadStore server. The methods checks if the FASTQ files exist and end with valid FASTQ ending.

```python 
# Upload FASTQ files to ReadStore 

rs_client.upload_fastq(fastq : List[str] | str)  # Path of FASTQ files to upload
```

## Contributing

Contributions make this project better! Whether you want to report a bug, improve documentation, or add new features, any help is welcomed!

### How You Can Help
- Report Bugs
- Suggest Features
- Improve Documentation
- Code Contributions

### Contribution Workflow
1. Fork the repository and create a new branch for each contribution.
2. Write clear, concise commit messages.
3. Submit a pull request and wait for review.

Thank you for helping make this project better!

## License

The ReadStore CLI is licensed under an Apache 2.0 Open Source License.
See the LICENSE file for more information.


## Credits and Acknowledgments<a id="acknowledgments"></a>

ReadStore CLI is built upon the following open-source python packages and would like to thank all contributing authors, developers and partners.

- Python (https://www.djangoproject.com/)
- requests (https://requests.readthedocs.io/en/latest/)
- pydantic (https://docs.pydantic.dev/latest/)
- pandas (https://pandas.pydata.org/)
