Metadata-Version: 2.1
Name: smartbulk_connector
Version: 1.0.3
Summary: BioTuring Smartbulk Connector
Home-page: https://app.bioturing.com/smartbulk
Author: BioTuring
Author-email: support@bioturing.com
Requires-Python: >=3.8, <3.12
Description-Content-Type: text/markdown
Requires-Dist: pydantic==2.7.2
Requires-Dist: requests==2.32.3
Requires-Dist: tqdm==4.66.4
Requires-Dist: pandas
Requires-Dist: numpy

# BioTuring SmartBulk-Connector SDK

Bioturing SmartBulk-Connector SDK is a Python package that provides an interface to interact with Bioturing's services

## Installation

You can install the SmartBulk-Connector SDK package using `pip`:

```bash
pip install --upgrade smartbulk-connector
```

## Get API TOKEN from SmartBulk

An API token is a unique identifier that allows a user or application to access an API. It is a secure way to authenticate a user or application and to control what permissions they have.

You do not need to regenerate your API token every time you use it. However, you may need to regenerate your API token if it is compromised.

Firstly, you need to navigate the SmartBulk SDK to get a token. The user’s token is generated from the host website

## How To Use


```python
import warnings
from smartbulk_connector import SmartbulkConnector

warnings.filterwarnings("ignore")
```

**Connect to SmartBulk private server**


```python
# authentication
DOMAIN = "<your-smartbulk-server-domain>"
TOKEN = "<your-API-token>"
connector = SmartbulkConnector(domain=DOMAIN, token=TOKEN)
```

    Connecting to host at https://dev.bioturing.com/smartbulk
    Connect to SmartBulk successfully


```python
# get current version
connector.get_versions()
```

    smartbulk_connector: version 0.1.0


**Get user groups available for your token**


```python
connector.get_user_groups()
```




    [{'group_id': '<hidden-id>',
      'group_name': 'Personal workspace'},
     {'group_id': '<hidden-id>', 
      'group_name': 'All members'},
     {'group_id': '<hidden-id>',
      'group_name': 'BioTuring Public Studies'}]



**Get all projects from a group**


```python
connector.get_all_projects_info_in_group(group_id='personal')
```


    [{'project_id': 'prj_8dfbc6a8-4e22-444f-b4a2-4df000c48141',
      'project_name': 'sample dataset'},
     {'project_id': 'prj_84c1a392-8080-11ef-8f07-0242ac130004',
      'project_name': 'human sample'},
     {'project_id': 'prj_94f6f0ef-d6c8-49f5-96f6-7bb5fa6a3de8',
      'project_name': 'mouse_sample'}]


**List files and directory in workspace**


```python
connector.listdir_workspace()
```


    ['example_data', 'sample dataset', 'mouse_sample']



```python
connector.listdir_workspace('example_data', fullpath=True)
```


    ['/path/to/server/workspace/upload/example_data/count_mat_2.csv',
     '/path/to/server/workspace/upload/example_data/count_mat.csv',
     '/path/to/server/workspace/upload/example_data/metadata_2.csv',
     '/path/to/server/workspace/upload/example_data/metadata.csv',
     '/path/to/server/workspace/upload/example_data/recipes.csv']


**List files and directory in cloud_storage**


```python
connector.listdir_cloud_storage()
```


    ['bioturing-lens', 'bioturingdebug', 'bioturingdebug.log.txt']


**Upload a single file**


```python
connector.upload_file('path/to/local/count_mat.csv', server_folder_name='test', debug_mode=True)
```


    {'status': 0,
     'path': '/path/to/server/workspace/upload/test/v1.count_mat.csv',
     'url_path': '/path/to/server/workspace/upload/test/v1.count_mat.csv'}


**Upload a folder**


```python
connector.upload_folder('tsv_sample/', debug_mode=True)
```

      0%|          | 0.00/16.1M [00:00<?, ?B/s]

    Upload tsv_sample/matrix_200.csv.gz, chunk index : 1 ...


    100%|██████████| 16.1M/16.1M [00:29<00:00, 538kB/s]



    {'folder_name': 'tsv_sample',
     'file_path': ['tsv_sample/recipes.csv',
      'tsv_sample/SRP092402.tsv',
      'tsv_sample/matrix_200.csv.gz'],
     'server_path': ['/path/to/server/workspace/upload/tsv_sample/v1.recipes.csv',
      '/path/to/server/workspace/upload/tsv_sample/v1.SRP092402.tsv',
      '/path/to/server/workspace/upload/tsv_sample/v1.matrix_200.csv.gz']}


**Create new project from the uploaded folder path in the SmartBulk Server**


```python
submit_result = connector.create_project(
    group_id='personal',
    species='human',
    project_name='human sample',
    matrix_paths=['/path/to/server/workspace/upload/tsv_sample/v1.matrix_200.csv.gz'],
    metadata_paths=['/path/to/server/workspace/upload/tsv_sample/v1.SRP092402.tsv'],
    dataset_name='Sample Dataset',
)
```

**Check project creation status**


```python
connector.check_project_status(submit_result=submit_result)
```

**Add new dataset to a project**


```python
submit_result = connector.add_project(
    project_id='prj_8dfbc6a8-4e22-444f-b4a2-4df000c48141',
    group_id='personal',
    species='human',
    matrix_paths=['/path/to/server/workspace//upload/tsv_sample/matrix_200.csv.gz'],
    metadata_paths=['/path/to/server/workspace//upload/tsv_sample/SRP092402.tsv'],
    dataset_name='Another Dataset'
)

connector.check_project_status(submit_result=submit_result)
```

**Create project with multiple datasets with recipes**

Create a new project with multiple dataset using recipes

    This recipes file is a csv file that includes: 

        dataset_name: the name of dataset
        path_on_server: server path to the file in one dataset
        file_type: can be either matrix or metadata, identify the path_on_server type
        species: can be either human or mouse

Sample recipes.csv file:

| dataset_name | path_on_server                                                                                                              | file_type | species |
|--------------|-----------------------------------------------------------------------------------------------------------------------------|-----------|---------|
| Dataset_1    | /path/to/server/workspace//upload/example_data/count_mat.csv | matrix    | human   |
| Dataset_1    | /path/to/server/workspace//upload/example_data/metadata.csv   | metadata  | human   |
| Dataset_2    | /path/to/server/workspace//upload/example_data/count_mat_2.csv | matrix    | human   |
| Dataset_2    | /path/to/server/workspace//upload/example_data/count_mat_3.csv | matrix    | human   |
| Dataset_2    | /path/to/server/workspace//upload/example_data/metadata_2.csv   | metadata  | human   |
| Dataset_2    | /path/to/server/workspace//upload/example_data/metadata_3.csv | metadata  | human   |



```python
connector.create_project_from_recipes(
    group_id='personal', 
    recipes_path='example_data/recipes.csv', 
    project_name='sample dataset',
)
```

**Add new dataset to a project with a recipes file**


```python
connector.add_project_from_recipes(
    project_id='prj_8dfbc6a8-4e22-444f-b4a2-4df000c48141',
    group_id='personal', 
    recipes_path='data/recipes.csv',
)
```

**Resume add dataset to a project with a recipes file using tracelog**


```python
connector.create_project_from_recipes(
    group_id='personal',
    recipes_path='data/recipes.csv', 
    project_name='21_bulk_datasets',
    trace_log='prj_8dfbc6a8-4e22-444f-b4a2-4df000c48141/project_trace_log.json'
)
```

or

```python
connector.add_project_from_recipes(
    project_id='prj_8dfbc6a8-4e22-444f-b4a2-4df000c48141',
    group_id='personal', 
    recipes_path='data/recipes.csv',
    trace_log='prj_8dfbc6a8-4e22-444f-b4a2-4df000c48141/project_trace_log.json'
)
```
