Metadata-Version: 2.1
Name: unavoids
Version: 1.2
Summary: UNKNOWN
Home-page: https://github.com/isotlaboratory/UNAVOIDS-Code
Author: Yousef, Waleed A. and Traoré, Issa and Briguglio, William
Author-email: wyousef@uvic.ca
License: GNU GENERAL PUBLIC LICENSE
Platform: UNKNOWN
Description-Content-Type: text/markdown

Module unavoids
===============

Functions
---------

    
`getAllNCDFs(X, p=0.0625, ncpus=4)`
:   Calculate the NCDF for all samples in parallel using a
    specified norm.
    
    Parameters
    ----------
    X : numpy array of shape (n_samples, m_features)
        Data matrix where `n_samples` is the number of samples
        and `n_features` is the number of features.
    p : float or np.inf constant
        The norm to use when calculating the distance between
        samples in `X`. If np.inf is supplied, then Chebyshev
        distance is used.
    ncpus : int
        The number of parallel processes.
    
    Returns
    ----------
    NCDFs : numpy array of shape (n_samples, n_samples)
        The i-th row equals the NCDF for the i-th sample in `X`,
        while the j-th column of the i-th row equals NCDF_xi(j)

    
`getBetaFractions(NCDFs_L, BetaSorted, BetaRanks, fraction_WSS, index)`
:   Calculate the UNVAOIDS outlier score for a given sample using
    the fractions of all gaps method.
    
    Parameters
    ----------
    NCDFs_L : numpy array of shape (n_samples, L_levels): 
        An array containing the intercepts for n NCDFs at L beta
        levels, where `n_samples` is the number of samples and
        `L_levels` is the number of beta levels.
    BetaSorted : numpy array of shape (n_samples, L_levels): 
        Rhe same as `NCDFs_L` but the intercepts are sorted along
        the L beta levels (column-wise sort of NCDFs_L).
    BetaRanks : numpy array of shape (n_samples, L_levels): 
        The same as `NCDFs_L` but the value at `NCDFs_L[i,j]` is
        replaced with the rank of `NCDFs_L[i,j]` on a given beta
        horizontal.
    fraction_WSS : int 
        The number of nearest intercepts to be encompassed by the
        gap whose size will be the score for a given beta level
        and NCDF intercept. Assumed to be less than 
        `n_samples/2`.
    index : int 
        The row index of the NCDF in `NCDFs_L` which we are
        finding the outlier score of.
    
    Returns
    ----------
    score : numpy array of shape (1, 1)
        The highest outlier score for `NCDF_L[index,:]` across
        all beta levels.

    
`getBetaHist(NCDFs_L, BetaSorted, index)`
:   Calculate the UNVAOIDS outlier score for a given sample using
    the histogram method.
    
    Parameters
    ----------
    NCDFs_L : numpy array of shape (n_samples, L_levels)
        An array containing the intercepts for n NCDFs at L beta
        levels, where `n_samples` is the number of samples and
        `L_levels` is the number of beta levels.
    BetaSorted : numpy array of shape (n_samples, L_levels)
        Rhe same as `NCDFs_L` but the intercepts are sorted along
        the L beta levels (column-wise sort of NCDFs_L).
    index : int 
        The row index of the NCDF in `NCDFs_L` which we are
        finding the outlier score of.
    
    Returns
    ----------
    score : numpy array of shape (1, 1)
        The highest outlier score for `NCDF_L[index,:]` across
        all beta levels.

    
`getNCDF(X, p, index)`
:   Calculate the NCDF for a single sample using a specified
    norm.
    
    Parameters
    ----------
    X : numpy array of shape (n_samples, m_features)
        Data matrix, assumed to be min max scaled to [0,1], where
        `n_samples` is the number of samples and `n_features` is
        the number of features.
    p : float or np.inf constant
        The norm to use when calculating the distance between
        samples in `X`. If np.inf is supplied, then Chebyshev
        distance is used.
    index : int
        The index of the sample in `X` which we are finding the
        NCDF of. Assumed to be less than `n_samples`.
    
    Returns
    ----------
    NCDFxi : numpy array of shape (1, m_features) 
        The NCDF of `X[i,:]` where i = `index` and the j-th value equals
        NCDF_xi(j)

    
`unavoidsScore(X, precomputed=False, p=0.0625, returnNCDFs=True, method='fractions', r=0.01, L=100, ncpus=4)`
:   Calculate the UNVAOIDS outlier score for all samples in 'X'.
    
    Parameters
    ----------
    X : numpy array of shape (n_samples, m_features)
        Data matrix where `n_samples` is the number of samples
        and `n_features` is the number of features.
    precomputed : bool, default=True
        If True, `X` is assumed to be an NCDF array in the same
        format as that returned by `getAllNCDFs`.
    p : float or np.inf constant
        The norm to use when calculating the distance between
        samples in `X`. If np.inf is supplied, then Chebyshev
        distance is used.
    returnNCDFs : bool, default=True
        If True, NCDF array is returned along with outlier
        scores.
    method : {"fractions", "histogram"}, default="fractions"
        Specifies which method to use for calculating outlier
        scores; either "fractions" or "histogram".
    r : float
        Percentage of nearest intercepts to be encompassed by the
        gap whose size will be the score for a given beta and
        NCDF intercept in the "fractions" method. Ignored if
        `method` == "histogram".
    L : int
        The number of beta levels to use.
    ncpus : int
        The number of parallel processes to use.
    
    Returns
    ----------
    scores : numpy array of shape (n_samples, 1)
        The i-th element in scores is the UNAVOIDS outlier score
        for the i-th sample(row) in `X`.
    NCDFs : numpy array of shape (n_samples, n_samples)
        The i-th row equals the NCDF for the i-th sample in `X`,
        while the j-th column of the i-th row equals NCDF_xi(j).
        Only returned if `returnNCDFs` == True.
    
    References
    ----------
    .. [1] W. A. Yousef, I. Traore and W. Briguglio, (2021)
       "UN-AVOIDS: Unsupervised and Nonparametric Approach for
       Visualizing Outliers and Invariant Detection Scoring",
       IEEE Transactions on Information Forensics and Security,
       vol. 16, pp. 5195-5210, [doi: 10.1109/TIFS.2021.3125608]
    
    Examples
    --------
    >>> import numpy as np
    >>> from joblib import load
    >>> from unavoids import unavoids
    >>> from sklearn import metrics
    >>>
    >>> X_all = load("simData.joblib")
    >>> Y = np.zeros((X_all.shape[0],))
    >>> Y[-3:] = 1         #last three samples are outliers
    >>> X = X_all[:,:4]    #grab first 4 features
    >>>
    >>> scores, NCDFs = unavoids.unavoidsScore(X, p=0.0625, returnNCDFs=True, method="fractions")
    >>> fpr, tpr, thresholds = metrics.roc_curve(Y, scores)
    >>> metrics.auc(fpr, tpr)
    1.0

