Metadata-Version: 2.1
Name: PyMoments
Version: 1.0.1
Summary: Unbiased estimators for multivariate statistical moments
Home-page: https://github.com/KevinDalySmith/PyMoments
Author: Kevin D. Smith
Author-email: kevinsmith@ucsb.edu
License: CC BY-NC 4.0
Description: # PyMoments
        
        PyMoments is a toolkit for unbiased estimation of multivariate statistical moments. 
        In the current version (1.0.0), only multivariate <i>k</i>-statistics are implemented,
        allowing for the unbiased estimation of cumulants. 
        An implementation of <i>h</i>-statistics (for unbiased estimation of central moments) is 
        planned for a future release.
        
        ## Installation
        
        PyMoments can be installed either from the GitHub 
        [source](https://github.com/KevinDalySmith/PyMoments), or via PyPI. 
        To install the package from GitHub, first clone the repository:
        ```
        $ git clone https://github.com/KevinDalySmith/PyMoments.git
        ```
        Navigate to the root directory, which contains the ```setup.py``` file.
        You can then use setuptools to install the package and run the unit tests:
        ```
        $ python setup.py install
        $ python setup.py test
        ```
        Alternatively, PyMoments can be installed using PyPI:
        ```
        $ pip install PyMoments
        ```
        Note that PyMoments requires [NumPy](https://numpy.org/). 
        I have only tested the package with NumPy version 1.16.5, but it should be compatible
        with older versions.
        
        ## Basic Usage
        
        The simplest use case is to compute a multivariate <i>k</i>-statistic from a 2D array of data. 
        For example, generate a random sample from a multivariate normal distribution:
        ```python
        import numpy as np
        mu = np.zeros(3,)
        sigma = np.array([
            [2, 0, 1],
            [0, 2, -1],
            [1, -1, 2]
        ])
        data = np.random.multivariate_normal(mu, sigma, size=(1000,))
        ```
        The ```data``` variable is a 1000 x 3 array, where each column corresponds to one of the three
        random variables in the joint distribution, and each row is an observation.
        
        The first <i>k</i>-statistic is identical to the sample mean. Thus, we can compare the first-order
        <i>k</i>-statistics for each column of the array, to the simple average:
        ```python
        from PyMoments import kstat
        
        first_order_kstats = [
            kstat(data, (0,)),
            kstat(data, (1,)),
            kstat(data, (2,))
        ]
        
        sample_means = np.mean(data, axis=0)
        
        print('First-order k-statistics:', first_order_kstats)
        print('Sample means:', sample_means) 
        ```
        The method ```kstat(data, (0,))``` computes the first-order <i>k</i>-statistic, from the first column
        of the data. As the console output of this example reveals, the first-order <i>k</i>-statistics are
        indeed equivalent to the sample means, up to floating point error.
        
        Second-order <i>k</i>-statistics are identical to covariances. Therefore, we can compare the values
        of the ```kstat()``` function to the covariance matrix of the data:
        ```python
        second_order_kstats = np.zeros((3, 3))
        for i in range(3):
            for j in range(3):
                second_order_kstats[i, j] = kstat(data, (i, j))
        
        print('Second-order k-statistics:')
        print(second_order_kstats)
        print('Covariance matrix:')
        print(np.cov(data.T))
        ```
        
        Higher-order <i>k</i>-statistics become difficult to express in terms of familiar statistics, but
        they still provide insight into the underlying distribution.
        Third-order <i>k</i>-statistics, for example, are related to the skewness of a distribution. 
        Normal distributions have zero skew, and therefore, all third-order <i>k</i>-statistics 
        should be fairly close to zero:
        ```python 
        print('Sample third-order k-statistics:') 
        print(kstat(data, (0, 1, 2)))
        print(kstat(data, (0, 0, 1)))
        print(kstat(data, (2, 2, 2)))
        ```
        
        The second argument to the ```kstat()``` method specifies a multiset of column indices
        (as a sequence of integers), which encodes the particular <i>k</i>-statistic to be computed. 
        The order of the <i>k</i>-statistic is equal to the length of this multiset.
        Note that ```kstat()``` is symmetric with respect to this argument, i.e., permuting the 
        indices will have no affect on the output. Repeated indices are allowed; in fact, repeated 
        indices are required to compute classical univariate <i>k</i>-statistics:
        ```python
        new_data = np.random.randn(1000, 1)
        
        print('First univariate k-stat:')
        print(kstat(new_data, (0,)))
        
        print('Second univariate k-stat:')
        print(kstat(new_data, (0, 0)))
        
        print('Third univariate k-stat:')
        print(kstat(new_data, (0, 0, 0)))
        
        print('Fourth univariate k-stat:')
        print(kstat(new_data, (0, 0, 0, 0)))
        ```
        
        It should also be pointed out that <i>k</i>-statistics become noisy and difficult to compute with
        increasing order. For example, try evaluating a 9th-order univariate cumulant on a new sample
        several times:
        ```python 
        kstat(np.random.randn(1000, 1), (0,) * 9)
        ```
        Not only does the function take several seconds to return, but repeats of this experiment can
        lead to widely different values of the <i>k</i>-statistic, despite the fact that normal 
        distributions have ninth-order cumulants of zero. 
        The runtime of ```kstat()``` on <i>n</i>th-order indices scales with Bell's number B(n), so... 
        I do not recommend trying <i>k</i>-statistics of order 10 or higher.
        
        ## License, Citation, and Acknowledgements
        PyMoments by Kevin D. Smith is licensed under a non-commercial Creative Commons license 
        ([CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)). When possible, please cite
        this package:
        ```
        @misc{PyMoments:2020,
          author = {Kevin D. Smith},
          title = {PyMoments: A Python toolkit for unbiased estimation of multivariate statistical moments},
          howpublished = {\texttt{https://github.com/KevinDalySmith/PyMoments}},
          year = 2020 
        }
        ```
        This work was supported in part by the U.S. Defense Threat Reduction Agency, under grant HDTRA1-19-1-0017.
        
Keywords: statistics
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Mathematics
Description-Content-Type: text/markdown
