Metadata-Version: 2.1
Name: lose
Version: 0.6
Summary: A helper package for hdf5 data handling
Home-page: https://github.com/okawo80085/lose
Author: okawo
Author-email: okawo.198@gmail.com
License: MIT
Description: # lose
        
        lose, but in particular `lose.LOSE`, is a helper class for handling data using `hdf5` file format and `PyTables`
        
        ```python
        >>> from lose import LOSE
        >>> LOSE()
        <lose hdf5 data handler, fname=None, atom=Float32Atom(shape=(), dflt=0.0)>
        generator parameters: iterItems=None, iterOutput=None, batch_size=1, limit=None, loopforever=False, shuffle=False
        ```
        
        ## installation
        ```python
        pip3 install -U lose
        ```
        or
        ```python
        pip install -U lose
        ```
        
        ---
        
        ## TOC
        1. [Structure](#structure)
        	* [vars](#vars)
        	* [methods](#methods)
        2. [Example usage](#example-usage)
        3. [Generator details](#generator-details)
        4. [Bugs or problems](#bugs-or-problems)
        5. [Change log](#change-log)
        
        ## structure
        #### vars
        `LOSE.fname` is the path to the `.h5` file including the name and extension, default is `None`.
        
        `LOSE.atom` recommended to be left at default, is the `dtype` for the data to be stored in, default is `tables.Float32Atom()` which results to arrays with `dtype==np.float32`.
        
        ***
        
        **`LOSE.generator()` related vars:**
        
        `LOSE.batch_size`: batch size of data getting pulled from the `.h5` file, default is `1`.
        
        `LOSE.limit`: int limits the amount of data loaded by the generator, default is `None`, if `None` all available data will be loaded.
        
        `LOSE.loopforever`: bool that allows infinite looping over the data, default is `False`.
        
        `LOSE.iterItems`: list of X group names and list of Y group names, default is `None`, required to be user defined for `LOSE.generator()` to work.
        
        `LOSE.iterOutput`: list of X output names and list of Y output names for `LOSE.iterItems`: to be mapped to, default is `None`, required to be user defined for `LOSE.generator()` to work.
        
        `LOSE.shuffle`: bool that enables shuffling of the data, default is `False`, shuffling is affected by `LOSE.limit` and `LOSE.batch_size`.
        
        `LOSE.mask_callback`: `None` by default, if `None` or is not a function the mask functionality is disabled, see [`LOSE.generator()` details](#generator-details) for more details.
        
        ---
        
        #### methods
        ```
        Help on LOSE in module lose.dataHandler object:
        
        class LOSE(builtins.object)
         |  Methods defined here:
         |
         |  __init__(self, fname=None)
         |      Initialize self.  See help(type(self)) for accurate signature.
         |
         |  __repr__(self)
         |      Return repr(self).
         |
         |  __str__(self)
         |      Return str(self).
         |
         |  generator(self, mask_callback=None)
         |
         |  getShape(self, arrName)
         |
         |  getShapes(self, *arrNames)
         |
         |  load(self, *args, batch_obj=':')
         |
         |  makeGenerator(self, layerNames, limit=None, batch_size=1, shuffle=False, mask_callback=None, **kwards)
         |
         |  newGroup(self, fmode='a', **kwards)
         |
         |  removeGroup(self, *args)
         |
         |  renameGroup(self, **kwards)
         |
         |  save(self, **kwards)
         |
         |  ----------------------------------------------------------------------
         ```
        
        `LOSE.newGroup(fmode='a', **groupNames)` is used to append/write(depends on the `fmode` keyword argument, default is `'a'`) group(s) to a `.h5` file.
        
        
        `LOSE.removeGroup(*groupNames)` is used for to remove group(s) from a file, provided the group(s) name.
        
        
        `LOSE.renameGroup(**groupNames)` is used to rename group(s) within a `.h5` file, see examples below.
        
        
        `LOSE.save(**groupNamesAndSahpes)` is used to save data(in append mode only) to a group(s) into a `.h5` file, the data needs to have the same shape as `group.shape[1:]` the data was passed to, `LOSE.get_shape(groupName)` can be used to get the `group.shape`.
        
        
        `LOSE.load(*groupNames)` is used to load data(hole group or a slice, to load a slice change `LOSE.batch_obj` to a string with the desired slice, default is `"[:]"`) from a group, group has to be present in the `.h5` file.
        
        
        `LOSE.getShape(groupName)` is used to get the shape of a single group, group has to be present in the `.h5` file.
        
        
        `LOSE.getShapes(*groupNames)` is used to get the shapes of group(s), group(s) have to be present in the `.h5` file.
        
        
        `LOSE.generator()` check [Generator details](#generator-details) section, `LOSE.iterItems` and `LOSE.iterOutput` have to be defined.
        
        
        `LOSE.makeGenerator(self, layerNames, limit=None, batch_size=1, shuffle=False, mask_callback=None, **data)` again check [Generator details](#generator-details) for more details.
        
        ---
        
        ## example usage
        here is some usage examples of how to save and use data
        
        ##### creating/adding new group(s) to a file
        ```python
        import numpy as np
        from lose import LOSE
        
        l = LOSE()
        l.fname = 'path/to/your/save/file.h5' # path to the save file
        
        exampleDataX = np.arange(20)
        exampleDataY = np.arange(3)
        
        l.newGroup(fmode='w', x=exampleDataX.shape, y=exampleDataY.shape) # creating new groups(ready for data saved to) in a file, if fmode is 'w' all groups in the file will be overwritten
        ```
        ##### saving data to a group(s)
        ```python
        import numpy as np
        from lose import LOSE
        
        l = LOSE()
        l.fname = 'path/to/your/save/file.h5'
        
        exampleDataX = np.arange(20)
        exampleDataY = np.arange(3)
        
        l.save(x=[exampleDataX, exampleDataX], y=[exampleDataY, exampleDataY]) # saving data into groups defined in the previous example
        l.save(y=[exampleDataY], x=[exampleDataX]) # the same thing
        ```
        ##### loading data from a group(s) within a file
        for this example, file has data from the previous example
        ```python
        import numpy as np
        from lose import LOSE
        
        l = LOSE()
        l.fname = 'path/to/your/save/file.h5'
        
        x, y = l.load('x', 'y') # loading data from the .h5 file(has to be a real file) populated by previous examples
        y2compare, x2compare = l.load('y', 'x') # the same thing
        
        print (np.all(x == x2compare), np.all(y == y2compare)) # True True
        
        x, y = l.load('x', 'y', batch_obj=np.s_[:2]) # ':2' will also work, only loads first 2 rows from the data arrays
        ```
        ##### getting the shape of a group(s)
        for this example, file has data from previous examples
        ```python
        import numpy as np
        from lose import LOSE
        
        l = LOSE()
        l.fname = 'path/to/your/save/file.h5'
        
        print (l.getShape('x')) # (3, 20)
        print (l.getShape('y')) # (3, 3)
        
        print (l.getShapes('y', 'x')) # [(3, 3), (3, 20)]
        ```
        ##### renaming group(s) in a file
        for this example, file has data from previous examples
        ```python
        import numpy as np
        from lose import LOSE
        
        l = LOSE('path/to/your/save/file.h5')
        x2compare, y2compare = l.load('x', 'y')
        print (l) # file structure before renaming any group(s)
        l.renameGroup(y='z', x='lol')
        lol, z = l.load('lol', 'z')
        print (l) # file structure after renaming group(s)
        print (np.all(x2compare == lol), np.all(y2compare == z)) # True True
        ```
        ##### removing group(s) from a file
        for this example, file has data from previous examples
        ```python
        from lose import LOSE
        
        l = LOSE(fname='path/to/your/save/file.h5')
        
        l.removeGroup('lol', 'z') # removing the group(s)
        
        x = l.load('lol') # now this will result in an error because group 'x' was removed from the file
        ```
        ## generator details
        `LOSE.generator(mask_callback=None)` is a python generator used to access data from a `hdf5` file in `LOSE.batch_size` pieces without loading the hole file/group into memory, also works with `tf.keras.model.fit_generator()`, __have__ to be used with a `with` context statement(see examples below).
        
        
        `LOSE.iterItems` and `LOSE.iterOutput` __have__ to be defined by user first.
        
        
        `mask_callback` accepts a function, that will be used a mask on each batch of data before it's yielded by the generator on every step. example of mask callback:
        ```python
        def mask(data): # data = tuple({'input_1':batch_inputarray_1, ..., 'input_n':batch_inputarray_n}, {'output_1':batch_outputarray_1, ... 'output_n':batch_outputarray_n})
        	x, y = data
        	# process the data, without changing any of the keys
        	return (x, y)
        ```
        
        ---
        
        `LOSE.make_generator(layerNames, limit=None, batch_size=1, shuffle=False, mask_callback=None, **data)` has the same rules as `LOSE.generator()`. however the data needs to be passed to it each time it's initialized, data is only stored temporarily, the parameters are passed to it on initialization, `layerNames` acts like `LOSE.iterOutput` and `LOSE.iterItems`, but every name in it has to match to names of the data passed(see examples below), if file `temp.h5` exists it will be overwritten and then deleted.
        
        ---
        
        ### example `LOSE.generator(mask_callback=None)` usage
        for this example lets say that file has requested data in it and the model input/output layer names are present.
        ```python
        import numpy as np
        from lose import LOSE
        
        l = LOSE('path/to/your/file/with/data.h5')
        
        l.iterItems = [['x1', 'x2'], ['y']] # names of X and Y groups, all group names need to have batch dim the same and be present in the .h5 file
        l.iterOutput = [['input_1', 'input_2'], ['dense_5']] # names of model's layers the data will be cast on, group.shape[1:] needs to match the layer's input shape
        l.loopforever = True
        l.batch_size = 20 # some batch size, can be bigger then the dataset, but won't output more data, it will just loop over or stop the iteration if LOSE.loopforever is False
        
        l.limit = 10000 # lets say that the file has more data, but you only want to train on first 10000 samples
        
        l.shuffle = True # enable data shuffling for the generator, costs memory and time
        
        with l.generator() as gen:
        	some_model.fit_generator(gen(), steps_per_epoch=50, epochs=1000, shuffle=False) # model.fit_generator() still can't shuffle the data, but LOSE.generator() can
        ```
        
        ### example `LOSE.make_generator(layerNames, limit=None, batch_size=1, shuffle=False, **data)` usage
        for this example lets say the model's input/output layer names are present and shapes match with the data.
        ```python
        import numpy as np
        import random
        from lose import LOSE
        
        def mask(data):
        	x, y = data
        	for key in x.keys():
        		x[key] += random.random()
        
        	return (x, y)
        
        l = LOSE()
        
        num_samples = 1000
        
        x1 = np.zeros((num_samples, 200)) # example data for the model, x1.shape[1:] == model.get_layer('input_1').output_shape[1:]
        x2 = np.zeros((num_samples, 150)) # example data for the model, x2.shape[1:] == model.get_layer('input_2').output_shape[1:]
        y = np.zeros((num_samples, 800)) # example data for the model, y.shape[1:] == model.get_layer('dense_5').output_shape[1:]
        
        with l.make_generator([['input_1', 'input_2'], ['dense_5']], batch_size=10, mask_callback=mask, shuffle=True, input_2=x2, input_1=x1, dense_5=y) as gen:
        	del x1 #remove from memory
        	del x2 #remove from memory
        	del y #remove from memory
        
        	some_model.fit_generator(gen(), steps_per_epoch=100, epochs=10000, shuffle=False) # again data can't be shuffled by model.fit_generator(), shuffling should be done by the generator
        ```
        
        # bugs or problems
        if you find any, raise an issue.
        
        # change log
        [change log(github only)](changeLog.md)
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3
Description-Content-Type: text/markdown
