Metadata-Version: 2.1
Name: probstructs
Version: 0.2.6
Summary: Probabilistic data structures
Home-page: https://github.com/martin-majlis/py-probstructs
Author: Martin Majlis
Author-email: martin@majlis.cz
License: MIT
Keywords: probabilistic structures exponential count–min sketch histogram
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Information Technology
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing

Probabilistic Structures
========================

``Probstructs`` is easy to use Python wrapper for C++ library `probstructs`_ . It supports Exponential Histograms, Count Min Sketch (CM-Sketch), and Exponential Count Min Sketch (ECM-Sketch).

.. _probstructs: https://github.com/martin-majlis/probstructs

|build-status| |docs| |version| |pyversions| |github-stars-flat|

Installation
------------

With pip:

.. code-block:: bash

    pip install probstructs

From source:

.. code-block:: bash

    pip install .

Classes
-------

CountMinSketch
^^^^^^^^^^^^^^

Countâ€“min sketch (CM sketch) is a probabilistic data structure that serves as a frequency table of events in a stream of data. It uses hash functions to map events to frequencies, but unlike a hash table uses only sub-linear space, at the expense of overcounting some events due to collisions.

C++ documentation: https://probstructs.readthedocs.io/en/latest/classes.html#countminsketch

.. code-block:: python

    from probstructs import CountMinSketch

    cm_sketch = CountMinSketch(100, 4)
    cm_sketch.inc("aaa", 1)
    cm_sketch.inc("bbb", 5)
    cm_sketch.inc("aaa", 2)

    print(cm_sketch.get("aaa"))
    # 3
    print(cm_sketch.get("bbb"))
    # 5
    print(cm_sketch.get("ccc"))
    # 0


    cm_sketch = CountMinSketch(width=100, depth=4)
    cm_sketch.inc(key="bbb", delta=5)
    print(cm_sketch.get(key="bbb"))
    # 5

ExponentialHistorgram
^^^^^^^^^^^^^^^^^^^^^

Exponential histogram (EH) is a probabilistic data structure that serves as a frequency counter for specific elements in the last N elements from stream..

C++ documentation: https://probstructs.readthedocs.io/en/latest/classes.html#exponentialhistorgram

.. code-block:: python

    from probstructs import ExponentialHistorgram


    eh = ExponentialHistorgram(1)
    eh.inc(1, 1)
    print(eh.get(1, 1))
    # 1
    eh.inc(1, 1)
    print(eh.get(1, 1))
    # 2
    eh.inc(2, 1)
    print(eh.get(1, 2))
    # 1

    eh = ExponentialHistorgram(window=1)
    eh.inc(tick=1, delta=1)
    print(eh.get(window=1, tick=1))
    # 1
    eh.inc(tick=1, delta=1)
    print(eh.get(window=1, tick=1))
    # 2
    eh.inc(tick=2, delta=1)
    print(eh.get(window=1, tick=2))
    # 1

ExponentialCountMinSketch
^^^^^^^^^^^^^^^^^^^^^^^^^

Exponential count-min sketch (ECM-Sketch) combines CM-Sketch with EH to count number of different elements in the last N elements in the stream.

C++ documentation: https://probstructs.readthedocs.io/en/latest/classes.html#exponentialcountminsketch

.. code-block:: python

    from probstructs import ExponentialCountMinSketch


    ecm_sketch = ExponentialCountMinSketch(100, 4, 8)

    ts = 0
    ecm_sketch.inc("aaa", ts, 1)
    ecm_sketch.inc("bbb", ts, 4)
    ecm_sketch.inc("ccc", ts, 8)

    print(ecm_sketch.get("aaa", 4, ts))
    # 1
    print(ecm_sketch.get("bbb", 4, ts))
    # 4
    print(ecm_sketch.get("ccc", 4, ts))
    # 8
    print(ecm_sketch.get("ddd", 4, ts))
    # 0

    ecm_sketch = ExponentialCountMinSketch(width=100, depth=4, window=8)

    ts = 0
    ecm_sketch.inc(key="aaa", tick=ts, delta=1)
    ecm_sketch.inc(key="bbb", tick=ts, delta=4)
    ecm_sketch.inc(key="ccc", tick=ts, delta=8)

    print(ecm_sketch.get(key="aaa", window=4, tick=ts))
    # 1
    print(ecm_sketch.get(key="bbb", window=4, tick=ts))
    # 4
    print(ecm_sketch.get(key="ccc", window=4, tick=ts))
    # 8
    print(ecm_sketch.get(key="ddd", window=4, tick=ts))
    # 0






.. |build-status| image:: https://travis-ci.org/martin-majlis/py-probstructs.svg?branch=master
    :alt: build status
    :target: https://travis-ci.org/martin-majlis/py-probstructs

.. |docs| image:: https://readthedocs.org/projects/py-probstructs/badge/?version=latest
    :target: http://py-probstructs.readthedocs.io/en/latest/?badge=latest
    :alt: Documentation Status

.. |version| image:: https://img.shields.io/pypi/v/probstructs.svg?style=flat
	:target: https://pypi.python.org/pypi/probstructs
	:alt: Version

.. |pyversions| image:: https://img.shields.io/pypi/pyversions/probstructs.svg?style=flat
	:target: https://pypi.python.org/pypi/probstructs
	:alt: Py Versions

.. |github-stars-flat| image:: https://img.shields.io/github/stars/martin-majlis/py-probstructs.svg?style=flat&label=Stars
	:target: https://github.com/martin-majlis/py-probstructs/
	:alt: GitHub stars

Changelog
=========

0.2.0
-----

* Introduce named parameters
* Update documentation to contain examples

0.1.0
-----
* Initial version


