Metadata-Version: 2.1
Name: nebulae
Version: 0.2.5
Summary: A novel and simple framework based on prevalent DL framework and other image processing libs.v0.2.5: unify convolution functions in different dimensions.
Home-page: https://github.com/
Author: Seria
Author-email: zzqsummerai@yeah.net
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
Requires-Dist: graphviz
Requires-Dist: h5py
Requires-Dist: pillow
Requires-Dist: piexif
Requires-Dist: pandas

# Nebulae Brochure

**A novel and simple framework based on concurrent mainstream frameworks and other image processing libraries. It is convenient to deploy almost every module independently.**

------

## Modules Overview

Fuel: easily manage and read dataset you need anytime

Toolkit: includes many utilities for better support of nebulae

------

## Fuel

**FuelGenerator()**

Build a FuelGenerator to spatial efficently store data.

- config: [<u>dict</u>] A dictionary containing all parameters.

- file_dir: [<u>str</u>] Where your raw data is.

- file_list: [<u>str</u>] A csv file in which all the raw datum file name and labels are listed.

- dtype: [<u>list</u> of <u>str</u>] A list of data types of all columns but the first one in *file_list*. Valid data types are 'uint8', 'uint16', 'uint32', 'int8', 'int16', 'int32', 'int64', 'float16', 'float32', 'float64', 'str'. Plus, if you add a 'v' as initial character e.g. 'vuint8', the data of each row in this column is allowed  to be saved in variable length.

- is_seq: [<u>bool</u>] If it is data sequence e.g. video frames. Defaults to false.

An example of file_list.csv is as follow. 'image' and 'label' are the key names of data and labels respectively. Note that the image name is a path relative to *file_dir*.

| image       | label |
| ----------- | ----- |
| img_1.jpg   | 2     |
| img_2.jpg   | 0     |
| ...         | ...   |
| img_100.jpg | 5     |



**FuelGenerator.generate(dst_path, height, width, channel=3, encode='JPEG', shards=1, keep_exif=True)**

- dst_path: [<u>str</u>] A hdf5/npz file where you want to save the data.

- height: [<u>int</u>] range between (0, +∞). The height of image data.

- width: [<u>int</u>] range between (0, +∞). The height of image data.

- channel: [<u>int</u>] The height of image data. Defaults to 3.

- encode: [<u>str</u>] The mean by which image data is encoded. Valid encoders are 'jpeg' and 'png'. 'PNG' is the way without information loss. Defaults to 'JPEG'.
- shards: [<u>int</u>] How many files you need to split the data into. Defaults to 1.
- keep_exif: [<u>bool</u>] Whether to keep EXIF information of photos. Defaults to true.

```python
import nebulae
# create a data generator
fg = nebulae.fuel.FuelGenerator(file_dir='/home/file_dir',
                                file_list='file_list.csv',
                                dtype=['uint8', 'int8'])
# generate compressed data file
fg.generate(dst_path='fuel.hdf5', 
            channel=3,
            height=368,
            width=368)
```



**FuelGenerator.modify(config=None)**

You can edit properties again for generating other file.

```python
fg.modify(height=200, width=200)
```

Passing a dictionary of changed parameters is equivalent.

```python
config = {'height': 200, 'width': 200}
fg.modify(config=config)
```



**FuelDepot()**

Build a Fuel Depot that allows you to deposit datasets.

```python
import nebulae
# create a data depot
fd = nebulae.fuel.FuelDepot()
```



**FuelDepot.loadFuel(config, name, batch_size, data_path, data_key, height=0, width=0, channel, frame, is_encoded=True, if_shuffle=True, rescale=True, resol_ratio=1, complete_last_batch=True, spatial_aug='', p_sa=(0), theta_sa=(0), temporal_aug='', p_ta=(0), theta_ta=(0))**

Mount dataset on your FuelDepot.

- name: [<u>str</u>] Name of your dataset.
- batch_size: [<u>int</u>] The size of a mini-batch.
- data_path: [<u>str</u>] The full path of your data file. It must be a hdf5/npz file.
- data_key: [<u>str</u>] The key name of data.
- if_shuffle: [<u>bool</u>] Whether to shuffle data samples every epoch. Defaults to True.
- is_encoded: [<u>bool</u>] If the stored data has been compressed. Defaults to True.
- channel: [<u>int</u>] The height of image data. Defaults to 3.
- height: [<u>int</u>] range between (0, +∞). Height of image data. Defaults to 0.
- width: [<u>int</u>] range between (0, +∞). Width of image data. Defaults to 0.
- frame: [<u>int</u>] range between [-1, +∞). The unified number of frames for sequential data. Defaults to 0.
- rescale: [<u>bool</u>] Whether to rescale values of fetched data to [-1, 1]. Default to True.
- resol_ratio: [<u>float</u>] range between (0, 1] The coefficient of subsampling for lowering image data resolution. Set it as 0.5 to carry out 1/2 subsampling. Defaults to 1.
- complete_last_batch: [<u>bool</u>] Whether to complete the last batch so that it has samples as many as other batches. Defaults to True.
- spatial_aug: [comma-separated <u>str</u>] Put spatial data augmentations you want in a string with comma as separator. Valid augmentations include 'flip', 'brightness', 'gamma_contrast' and 'log_contrast', e.g. 'flip,brightness'. Defaults to '' which means no augmentation.
- p_sa: [<u>tuple</u> of <u>float</u>] range between [0, 1]. The probabilities of taking spatial data augmentations according to the order in *spatial_aug*. Defaults to (0).
- theta_sa: [<u>tuple</u>] The parameters of spatial data augmentations according to the order in *spatial_aug*. Defaults to (0).
- temporal_aug: [comma-separated <u>str</u>] Put temporal data augmentations you want in a string with comma as separator. Valid augmentations include 'sample', e.g. 'sample'. Make sure to set *is_seq* as True if you want to enable temporal augmentation. Defaults to '' which means no augmentation.
- p_ta: [<u>tuple</u> of <u>float</u>] range between [0, 1]. The probabilities of taking temporal data augmentations according to the order in *temporal_aug*. Defaults to (0).
- theta_ta: [<u>tuple</u>] The parameters of temporal data augmentations according to the order in *temporal_aug*. Defaults to (0).

All data augmentation approaches are listed as follows:

<table>
  <tr>
    <th>Data Source</th><th>Augmentation</th><th>Parameters</th>
  </tr>
  <tr>
    <td rowspan='5'>Image</td><td>flip</td><td>empty tuple: ()</td>
  </tr>
  <tr>
    <td>crop</td><td>nested tuple of float: ((minimum area ratio, maximum area ratio), (minimum aspect ratio, maximum aspect ratio)) of cropped area, where aspect ratio is width/height</td>
  </tr>
  <tr>
    <td>brightness</td><td>float, range between (0, 1]: increment/decrement factor on brightness</td>
  </tr>
  <tr>
    <td>gamma_contrast</td><td>float, range between (0, 1]: expansion/shrinkage factor on pixel value domain</td>
  </tr>
  <tr>
    <td>log_contrast</td><td>float, range between (0, 1]: expansion/shrinkage factor on pixel value domain</td>
  </tr>
  <tr>
    <td>Sequence</td><td>sampling</td><td>positive int, denoted as theta: sample an image every theta frames</td>
  </tr>
</table>

```python
fd.loadFuel(name='test-img',
            batch_size=4,
            data_key='image',
            data_path='/home/image.hdf5',
            width=200, height=200,
            resol_ratio=0.5,
            spatial_aug='brightness,gamma_contrast',
            p_sa=(0.5, 0.5), theta_sa=(0.2, 1.2))
```



**FuelDepot.modify(tank, config=None)**

- tank: [<u>str</u>] Specify the dataset to modify. 

You can edit properties to change the way you fetch batch and process data.

```python
fd.modify(tank='test-img', name='test', batch_size=2)
```

Passing a dictionary of changed parameters is equivalent.

```python
config = {'name':'test', 'batch_size':2}
fd.modify(tank='test-img', config=config)
```



**FuelDepot.unloadFuel(tank='')**

- tank: [<u>str</u>] Specify the dataset to unmount. Defaults to '' in which case all datasets are going to get unmounted.

Unmount dataset that is no longer necessary.



**FuelDepot.nextBatch(tank)** 

- tank: [<u>str</u>] Specify the dataset from which data is fetched. 

Return a dictionary containing a batch of data, labels and other information.



**FuelDepot.epoch**

Attribute: a dictionary containing current epoch of each dataset. Epoch starts from 1.



**FuelDepot.MPE**

Attribute: a dictionary containing how many iterations there are within an epoch for each dataset.



**FuelDepot.volume**

Attribute: a dictionary containing the number of datum in each dataset.

