Metadata-Version: 1.1
Name: wendelin.core
Version: 0.7
Summary: Out-of-core NumPy arrays
Home-page: https://lab.nexedi.com/nexedi/wendelin.core
Author: Kirill Smelkov
Author-email: kirr@nexedi.com
License: GPLv3+ with wide exception for Open-Source
Description: ==========================================
         Wendelin.core - Out-of-core NumPy arrays
        ==========================================
        
        Wendelin.core allows you to work with arrays bigger than RAM and local disk.
        Bigarrays are persisted to storage, and can be changed in transactional manner.
        
        In other words bigarrays are something like `numpy.memmap`_ for numpy.ndarray
        and OS files, but support transactions and files bigger than disk. The whole
        bigarray cannot generally be used as a drop-in replacement for numpy arrays,
        but bigarray *slices* are real ndarrays and can be used everywhere ndarray can
        be used, including in C/Cython/Fortran code. Slice size is limited by
        virtual address-space size, which is ~ max 127TB on Linux/amd64.
        
        The main class to work with is `ZBigArray` and is used like `ndarray` from
        `NumPy`_:
        
        1. create array::
        
            from wendelin.bigarray.array_zodb import ZBigArray
            import transaction
        
            # root is connected to opened database
            root['A'] = A = ZBigArray(shape=..., dtype=...)
            transaction.commit()
        
        2. view array as a real ndarray::
        
            a = A[:]        # view which covers all array, if it fits into address-space
            b = A[10:100]
        
           data for views will be loaded lazily on memory access.
        
        3. work with views, including using C/Cython/Fortran functions from NumPy
           and other libraries to read/modify data::
        
            a[2] = 1
            a[10:20] = numpy.arange(10)
            numpy.mean(a)
        
           | the amount of modifications in one transaction should be less than available RAM.
           | the amount of data read is limited only by virtual address-space size.
        
        4. data can be appended to array in O(δ) time::
        
            values                  # ndarray to append of shape  (δ,)
            A.append(values)
        
           and array itself can be resized in O(1) time::
        
            A.resize(newshape)
        
        5. changes to array data can be either discarded or saved back to DB::
        
            transaction.abort()     # discard all made changes
            transaction.commit()    # atomically save all changes
        
        
        
        When using NEO_ or ZEO_ as a database, bigarrays can be simultaneously used by
        several nodes in a cluster.
        
        
        Please see `demo/demo_zbigarray.py`__ for a complete example.
        
        __ demo/demo_zbigarray.py
        
        
        Current state and Roadmap
        =========================
        
        Wendelin.core works in real life for workloads Nexedi_ is using in production,
        including 24/7 projects. We are, however, aware of the following
        limitations and things that need to be improved:
        
        - wendelin.core is currently not very fast
        - there are big - proportional to input in size - temporary array allocations
          in third-party libraries (NumPy_, `scikit-learn`_, ...) which might practically
          prevent processing out-of-core arrays depending on the functionality used.
        
        Thus
        
        - we are currently working on improved wendelin.core design and implementation,
          which will use kernel virtual memory manager (instead of one implemented__ in__
          userspace__) with arrays backend presented to kernel via FUSE as virtual
          filesystem implemented in Go.
        
        __  https://lab.nexedi.com/nexedi/wendelin.core/blob/master/include/wendelin/bigfile/virtmem.h
        __  https://lab.nexedi.com/nexedi/wendelin.core/blob/master/bigfile/virtmem.c
        __  https://lab.nexedi.com/nexedi/wendelin.core/blob/master/bigfile/pagefault.c
        
        In parallel we will also:
        
        - try wendelin.core 1.0 on large data sets
        - identify and incrementally fix big-temporaries allocation issues in NumPy and
          scikit-learn
        
        We are open to community help with the above.
        
        
        Additional materials
        ====================
        
        - Wendelin.core tutorial__
        - Slides__ (pdf__) from presentation about wendelin.core in PyData Paris 2015
        
        __  https://www.nexedi.com/wendelin-Core.Tutorial.2016
        __  http://www.wendelin.io/NXD-Wendelin.Core.Non.Secret/asEntireHTML
        __  http://www.wendelin.io/NXD-Wendelin.Core.Non.Secret?format=pdf
        
        
        .. _NumPy:          http://www.numpy.org/
        .. _scikit-learn:   http://scikit-learn.org/
        .. _numpy.memmap:   http://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html
        .. _NEO:            http://www.neoppod.org/
        .. _ZEO:            https://pypi.python.org/pypi/ZEO
        .. _Nexedi:         https://www.nexedi.com/
        
        ----
        
        Wendelin.core change history
        ============================
        
        0.7 (2016-07-14)
        ------------------
        
        - Add support for Python 3.5 (`commit 1`__, 2__)
        
          __ https://lab.nexedi.com/nexedi/wendelin.core/commit/20115391
          __ https://lab.nexedi.com/nexedi/wendelin.core/commit/e6beab19
        
        - Fix bug in pagemap code which could lead to crashes and other issues (`commit`__)
        
          __ https://lab.nexedi.com/nexedi/wendelin.core/commit/ee9bcd00
        
        - Various bugfixes
        
        0.6 (2016-06-13)
        ----------------
        
        - Add support for FORTRAN ordering (`commit 1`__, 2__)
        
          __ https://lab.nexedi.com/nexedi/wendelin.core/commit/ab9ca2df
          __ https://lab.nexedi.com/nexedi/wendelin.core/commit/2ca0f076
        
        
        - Avoid deadlocks via doing `loadblk()` calls with virtmem lock released
          (`commit 1`__, 2__)
        
          __ https://lab.nexedi.com/nexedi/wendelin.core/commit/f49c11a3
          __ https://lab.nexedi.com/nexedi/wendelin.core/commit/0231a65d
        
        - Various bugfixes
        
        0.5 (2015-10-02)
        ----------------
        
        - Introduce another storage format, which is optimized for small changes, and
          make it the default.
          (`commit 1`__, 2__)
        
          __ https://lab.nexedi.com/nexedi/wendelin.core/commit/13c0c17c
          __ https://lab.nexedi.com/nexedi/wendelin.core/commit/9ae42085
        
        - Various bugfixes and documentation improvements
        
        
        0.4 (2015-08-19)
        ----------------
        
        - Add support for O(δ) in-place BigArray.append() (commit__)
        
          __ https://lab.nexedi.com/nexedi/wendelin.core/commit/1245acc9
        
        - Implement proper multithreading support (commit__)
        
          __ https://lab.nexedi.com/nexedi/wendelin.core/commit/d53271b9
        
        - Implement proper RAM pages invalidation when backing ZODB objects are changed
          from outside (`commit 1`__, 2__)
        
          __ https://lab.nexedi.com/nexedi/wendelin.core/commit/cb779c7b
          __ https://lab.nexedi.com/nexedi/wendelin.core/commit/92bfd03e
        
        - Fix all kind of failures that could happen when ZODB connection changes
          worker thread in-between handling requests (`commit 1`__, 2__)
        
          __ https://lab.nexedi.com/nexedi/wendelin.core/commit/c7c01ce4
          __ https://lab.nexedi.com/nexedi/wendelin.core/commit/64d1f40b
        
        - Tox tests now cover usage with FileStorage, ZEO and NEO ZODB storages
          (`commit 1`__, 2__)
        
          __ https://lab.nexedi.com/nexedi/wendelin.core/commit/010eeb35
          __ https://lab.nexedi.com/nexedi/wendelin.core/commit/7fc4ec66
        
        - Various bugfixes
        
        
        
        0.3 (2015-06-12)
        ----------------
        
        - Add support for automatic BigArray -> ndarray conversion, so that e.g. the
          following::
        
            A = BigArray(...)
            numpy.mean(A)       # passing BigArray to plain NumPy function
        
          either succeeds, or raises MemoryError if not enough address space is
          available to cover whole A. (current limitation is ~ 127TB on linux/amd64)
        
          (commit__)
        
          __ https://lab.nexedi.com/nexedi/wendelin.core/commit/00db08d6
        
        - Various bugfixes (build-fixes, crashes, overflows, etc)
        
        
        0.2 (2015-05-25)
        ----------------
        
        - Add support for O(1) in-place BigArray.resize() (commit__)
        
          __ https://lab.nexedi.com/nexedi/wendelin.core/commit/ca064f75
        
        - Various build bugfixes (older systems, non-std python, etc)
        
        
        0.1 (2015-04-03)
        ----------------
        
        - Initial release
        
Keywords: bigdata out-of-core numpy virtual-memory
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Framework :: ZODB
