Metadata-Version: 2.1
Name: zegami-cli
Version: 1.1.7
Summary: Command Line Interface for Zegami
Home-page: https://github.com/zegami/zegami-cli
Author: Zegami
Author-email: help@zegami.com
License: UNKNOWN
Project-URL: Zegami, https://zegami.com
Project-URL: Bug Reports, https://github.com/zegami/zegami-cli/issues
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: User Interfaces
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Description-Content-Type: text/markdown
Requires-Dist: appdirs (==1.4.3)
Requires-Dist: azure-storage-blob (==12.3.0)
Requires-Dist: colorama (==0.4.3)
Requires-Dist: jsonschema (==3.2.0)
Requires-Dist: PyYaml (==5.3.1)
Requires-Dist: requests (<3.0,>=2.15.0)
Requires-Dist: tqdm (==4.43.0)
Provides-Extra: sql
Requires-Dist: pyodbc (==4.0.30) ; extra == 'sql'
Requires-Dist: SQLAlchemy (==1.3.15) ; extra == 'sql'

# zegami-cli
A Command Line Interface for [Zegami](https://www.zegami.com).

Zegami is a visual data exploration tool that makes the analysis of large collections of image rich information quick and simple.

The Zegami cli relies on a combination of yaml files and arguments.

The first step is to create a collection

# Installation
```
pip3 install zegami-cli[sql]
```

# Commands

## Login
The login command promtps for username and password which is then used to retrieve a long-lived API token which can be used for subsequent requests. The token is stored in a file in the currenet users data directory.
Once retrieved all subsequent commands will use the stored token, unless it is specifically overridden with the `--token` option
```
zeg login
```

## Get a collection
Get the details of a collection.
If the `collection id` is excluded then all collections will be listed.
```
zeg get collections [collection id] --project [Project Id] --token [API token]
```

## Create a collection
Create a collection using a combined dataset and imageset config.
```
zeg create collections --project [Project Id] --token [API token] --url [Server url] --config [path to configuration yaml]
```
The following config properties are supported for file based imageset and datasets.

```
# The name of the collection
name: file based
description: an example collection with a file based imageset and dataset
# The type of data set. For now this needs to be set to 'file'
dataset_type: file
# Config for the file data set type
imageset_type: file
# Config for the file image set type
file_config:
# Path to the dataset file
    path:
# A collection of paths to image files. Paths can be to both images and directories
    paths:
        - an_image.jpg
        - a/directory/path
# Name of the column in the dataset that contains the image name
dataset_column: image_name
```

If you are creating a url based imageset with a data file use these properties.

The dataset_column property is used to set the column where the url is stored. You will need to include the full image url e.g. https://zegami.com/wp-content/uploads/2018/01/weatherall.svg

```
# The name of the collection
name: url based
# The description of the collection
description: an example collection with a file based dataset where images are to be downloaded from urls
# The type of image set. for now this needs to be set to 'url'
imageset_type: url
# Name of the column in the dataset that contains the image url
dataset_column: image_name
# Url pattern - python format string where {} is the name of the image name (from data file)
url_template: https://example.com/images/{}?accesscode=abc3e20423423497
dataset_type: file
# Config for the file data set type
file_config:
# Path to the dataset file
    path:
```

If you are creating an imageset on Azure from a private azure bucket with a local file do as follows:

```
# The name of the collection
name: azure bucket based
# The description of the collection
description: an example collection with a file based dataset where images are to be downloaded from an azure bucket
dataset_type: file
# Config for the file data set type
file_config:
# Path to the dataset file
    path:
# The type of image set. for now this needs to be set to 'url'
imageset_type: azure_storage_container
# Name of the container
container_name: my_azure_blobs
# Name of the column in the dataset that contains the image url
dataset_column: image_name

# Note that the storage account connection string should also be made available via environment variable AZURE_STORAGE_CONNECTION_STRING
```

If you are using SQL data see below for config

## Update a collection
Update a collection - *coming soon*.

## Delete a collection
Delete a collection
```
zeg delete collections [collection id] --project [Project Id] --token [API token]
```

## Publish a collection
```
zeg publish collection [collection id] --project [Project Id] --config [path to configuration yaml] --token [API token]
```

The config `yaml` file is used to specify additional configuration for the collection publish.
```
# The type of update. For now this needs to be set to 'publish'
update_type: publish
# Config for the publish update type
publish_config:
# Flag to indicate if the collection should be published or unpublished
    publish: true
# The id of the project to publish to
    destination_project: public
```

## Get a data set
Get a data set
```
zeg get dataset [dataset id] --project [Project Id] --token [API token]
```


## Update a data set
Update an existing data set with new data.

Note that when using against a collection the dataset id used should be the upload_dataset_id. This is different from the below imageset update which requires the dataset identifier known as dataset_id from the collection.
```
zeg update dataset [dataset id] --project [Project Id] --config [path to configuration yaml] --token [API token]
```

The config `yaml` file is used to specify additional configuration for the data set update. There are *two* supported `dataset_type` supported.

### File
The `file` type is used to update a data set with a file. It can be set up to either specify the fully qualified path to a `.csv.`, `.tsv` or `.xlsx` file to upload using the `path` property *or* the `directory` property can be used to upload the latest file in a directory location.
```
# The type of data set. For now this needs to be set to 'file'
dataset_type: file
# Config for the file data set type
file_config:
# Path to the dataset file
    path:
# Or path to a directory that contains data files.
# Only the latest file that matches the accepted extensions (.csv, .tsv, .xlsx)
# will be uploaded. This is useful for creating collections based on
# automated exports from a system, like log files.
    directory:
```

### SQL
The `sql` type is used to update a data set based on an `SQL` query.
Uses SQLAlchemy to connect to the database. See http://docs.sqlalchemy.org/en/latest/core/engines.html and https://www.connectionstrings.com/ for the correct connection string format.

```
# The type of data set. For now this needs to be set to 'file'
dataset_type: sql
# Config for the sql data set type
sql_config:
# The connection string.
    connection:
# SQL query
    query:
```

### PostgreSQL - tested on Linux and windows, up to Python v3.7
Pre-requisites :

1. Standard requirements - code editor, pip package manager, python 3.7.

2. Make sure Zegami CLI latest is installed 
```
pip install zegami-cli[sql] --upgrade --no-cache-dir
```
_Note: --no-cache-dir avoids some errors upon install_

Test the install with the login command, which prompts for username and password. This is then used to retrieve a long-lived API token which can be used for subsequent requests. The token is stored in a file in the current users data directory.
Once retrieved all subsequent commands will use the stored token, unless it is specifically overridden with the `--token` option
```
zeg login
```

3. Install pre-requirements for PostgreSQL connection

Psycopg2 - https://pypi.org/project/psycopg2/ , http://initd.org/psycopg/
```
pip install python-psycopg2
```

_libpq-dev was required for linux, not windows_
libpq-dev - https://pypi.org/project/libpq-dev/ , https://github.com/ncbi/python-libpq-dev
```
sudo apt-get install libpq-dev
```

Once these are installed you will need to create a YAML file with the correct connection strings.

*Connection String Example:*
```
# The type of data set. For now this needs to be set to 'file'
dataset_type: sql
# Config for the sql data set type
sql_config: 
# The connection string.
    connection: "postgresql://postgres:myPassword@localhost:5432/postgres?sslmode=disable" 
# SQL query
    query: select * from XYZ
```
_Note: Connections strings must have indentation by "connection" and "query"_

If you have already created a collection we can run the update command as above 
e.g. zeg update dataset upload_dataset_id --project projectID --config root/psqlconstring.yaml 

If successful the following message will appear: 
```
=========================================
update dataset with result:
-----------------------------------------
id: datasetID
name: Schema dataset for postgresql test
source:
  blob_id: blobID
  dataset_id: datasetID
  upload:
    name: zeg-datasetiop9cbtn.csv

=========================================
```

Useful links:
https://www.npgsql.org/doc/connection-string-parameters.html
https://www.connectionstrings.com/postgresql/ (Standard)
https://docs.sqlalchemy.org/en/13/core/engines.html#postgresql (Specifies pre-reqs for connection)




## Delete a data set
Delete a data set - *coming soon*.
```
zeg delete dataset [dataset id] --project [Project Id] --token [API token]
```

## Get an image set
Get an image set - *coming soon*.
```
zeg get imageset [imageset id] --project [Project Id] --token [API token]
```

## Update an image set
Update an image set with new images.
```
zeg update imageset [imageset id] --project [Project Id] --config [path to configuration yaml] --token [API token]
```

The config `yaml` file is used to specify additional configuration for the image set update.

### File imageset

The `paths` property is used to specify the location of images to upload and can include both images and directories.


```
# The type of image set. for now this needs to be set to 'file'
imageset_type: file
# Config for the file image set type
file_config:
# A collection of paths. Paths can be to both images and directories
    paths:
        - an_image.jpg
        - a/directory/path
# Unique identifier of the collection
collection_id: 5ad3a99b75f3b30001732f36
# Unique identifier of the collection data set (get this from dataset_id)
dataset_id: 5ad3a99b75f3b30001732f36
# Name of the column in the dataset that contains the image name
dataset_column: image_name
```

### URL imageset

The dataset_column property is used to set the column where the url is stored. You will need to include the full image url e.g. https://zegami.com/wp-content/uploads/2018/01/weatherall.svg

```
# The type of image set. for now this needs to be set to 'url'
imageset_type: url
# Unique identifier of the collection
collection_id: 5ad3a99b75f3b30001732f36
# Unique identifier of the collection data set
dataset_id: 5ad3a99b75f3b30001732f36
# Name of the column in the dataset that contains the image url
dataset_column: image_name
# Url pattern - python format string where {} is the name of the image name (from data file)
url_template: https://example.com/images/{}?accesscode=abc3e20423423497
```

### Azure storage imageset

```
# The type of image set.
imageset_type: azure_storage_container
# Name of the container
container_name: my_azure_blobs
# Unique identifier of the collection
collection_id: 5ad3a99b75f3b30001732f36
# Unique identifier of the collection data set
dataset_id: 5ad3a99b75f3b30001732f36
# Name of the column in the dataset that contains the image url
dataset_column: image_name

# Note that the storage account connection string should also be made available via environment variable AZURE_STORAGE_CONNECTION_STRING
```

## Delete an image set
Delete an image set - *coming soon*.
```
zeg delete imageset [imageset id] --project [Project Id] --token [API token]
```


# Developer

## Tests
Setup tests:
```
pip install -r requirements/test.txt
```

Run tests:
```
python3 -m unittest discover .
```


