Metadata-Version: 2.1
Name: omoment
Version: 0.1.0
Summary: Estimates statistical moments in online or distributed settings.
Keywords: statistics,mean,variance,distributed,estimation,efficient,additive
Author-email: Tomas Protivinsky <tomas.protivinsky@gmail.com>
Requires-Python: >=3.7
Description-Content-Type: text/x-rst
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Dist: numpy >= 1.19.0
Requires-Dist: pandas >= 1.1.0
Requires-Dist: pytest >= 7.0.0 ; extra == "dev"
Requires-Dist: pytest-cov >= 3.0.0 ; extra == "dev"
Project-URL: Homepage, https://protivinsky.github.io/omoment
Provides-Extra: dev

|pytest-badge| |doc-badge|

..  |pytest-badge| image:: https://github.com/protivinsky/omoment/actions/workflows/pytest.yaml/badge.svg
    :alt: pytest

..  |doc-badge| image:: https://github.com/protivinsky/omoment/actions/workflows/builddoc.yaml/badge.svg
    :alt: pytest
    :target: https://protivinsky.github.io/omoment/index.html

OMoment: Efficient online calculation of statistical moments
============================================================

OMoment package calculates moments of statistical distributions (mean and variance) in online or distributed settings.

- Suitable for large data – works well with numpy and Pandas and in distributed setting.
- Moments calculated from different parts of data can be easily combined or updated for new data (supports addition
  of results).
- Objects are lightweight, calculation is done in numpy if possible.
- Weights for data can be provided.
- Invalid values (NaNs, infinities are omitted by default).

Typical application is calculation of means and variances of many chunks of data (corresponding to different groups
or to different parts of the distributed data), the results can be analyzed on level of the groups or easily
combined to get exact moments for the full dataset.

Basic example
-------------

.. code:: python

    from omoment import OMeanVar
    import numpy as np
    import pandas as pd

    rng = np.random.default_rng(12354)
    g = rng.integers(low=0, high=10, size=1000)
    x = g + rng.normal(loc=0, scale=10, size=1000)
    w = rng.exponential(scale=1, size=1000)

    # calculate overall moments
    OMeanVar(x, weight=w)
    # should give: OMeanVar(mean=4.6, var=108, weight=1.08e+03)

    # or calculate moments for every group
    df = pd.DataFrame({'g': g, 'x': x, 'w': w})
    omvs = df.groupby('g').apply(OMeanVar.of_frame, x='x', w='w')

    # and combine group moments to obtain the same overall results
    OMeanVar.combine(omvs)

    # addition is also supported
    omvs.loc[0] + omvs.loc[1]

At the moment, only univariate distributions are supported. Bivariate or even multivariate distributions can be
efficiently processed in a similar fashion, so the support for them might be added in the future. Moments of
multivariate distributions would also allow for linear regression estimation and other statistical methods
(such as PCA or regularized regression) to be calculated in a single pass through large distributed datasets.

Documentation
-------------

- https://protivinsky.github.io/omoment/index.html

