Metadata-Version: 2.1
Name: nlcor
Version: 3.0.3
Summary: Nlcor uses a dynamic partitioning approach with adaptive segmentation for a more precise nonlinear correlation estimation.
Home-page: https://github.com/ProcessMiner/nlcorpython
Author: Chitta Ranjan, Devleena Banerjee,Vahab Najari
Author-email: cranjan@processminer.com, dbanerjee@processminer.com
Classifier: Programming Language :: Python :: 3.6
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: scipy

## Citation

* **Package 'nlcor': Compute Nonlinear Correlations**

@article{ranjan2022packagenlcor,
  title={Package 'nlcor': Compute Nonlinear Correlations},
  author={Ranjan, Chitta, Banerjee, Devleena, and Najari, Vahab},
  journal={Research Gate},
  year={2022},
  doi={10.13140/RG.2.2.33716.68480}
}

Chitta Ranjan,Devleena Banerjee and Vahab Najari. “Package ’nlcor’: Compute Nonlinear Correlations”. In:Research Gate(2020).doi:10.13140/RG.2.2.33716.68480.


Purpose
-------

Estimate nonlinear correlations using `nlcor`. Yields a correlation
estimate between 0 and 1, and the adjusted p value. The p value
indicates if the estimated correlation is statistically significant.

Description
-----------

Correlations are commonly used in various data mining applications.
Typically linear correlations are estimated. However, the data may have
a nonlinear correlation but little to no linear correlation. If, for
example, we are performing data exploration using automated techniques
on many variables, such nonlinearly correlated variables can easily be
overlooked.

Nonlinear correlations are quite common in real data. Due to this,
nonlinear models, such as SVM, are employed for regression,
classification, etc. However, there are not many approaches to estimate
nonlinear correlations. If developed, it will find application in data
exploration, variable selection, and other areas.

In this package, we provide an implementation of a nonlinear correlation
estimation method using an adaptive local linear correlation computation
in `nlcor`. The function `nlcor` returns the nonlinear correlation
estimate, the corresponding adjusted p value, and an optional plot
visualizing the nonlinear relationships.

The correlation estimate will be between 0 and 1. The higher the value
the more is the nonlinear correlation. Unlike linear correlations, a
negative value is not valid here. Due to multiple local correlation
computations, the net p value of the correlation estimate is adjusted
(to avoid false positives). The plot visualizes the local linear
correlations.

In the following, we will show its usage with a few examples. In the
given examples, the linear correlations between `x` and `y` is small,
however, there is a visible nonlinear correlation between them. This
package contains the data for these examples and can be used for testing
the package.

### Example 1. A nonlinear correlated data with close to zero linear correlation.

A data with cyclic nonlinear correlation.

    plot(x1, y1)
![](https://github.com/ProcessMiner/nlcor/blob/master/README_files/figure-markdown_strict/Figure-1.1-1.png?raw=true)

The linear correlation of the data is,

    spearmanr(x1,y1)
    # 0.0.00896045678795923

As expected, the correlation is close to zero. We estimate the nonlinear
correlation using `nlcor`.

    nlcor(x1, y1, plt = True)
{cor_estimate': 0.8825652697448009,
 'adjusted_p_value': 0.0,
 'cor_plot': <AxesSubplot:xlabel='x', ylabel='y'>"

![](https://github.com/ProcessMiner/nlcor/blob/master/README_files/figure-markdown_strict/Figure-1.2-1.png?raw=true)

The plot shows the piecewise linear correlations present in the data.

### Example 2. Non-uniform correlation structure.

A data with non-uniform piecewise linear correlations.

    plot(x2, y2)

![](https://github.com/ProcessMiner/nlcor/blob/master/README_files/figure-markdown_strict/Figure-2.1-1.png?raw=true)

The linear correlation of the data is,

	spearmanr(x2,y2)
    #0.8362465969598638

The linear correlation is quite high in this data. However, there is
significant and higher nonlinear correlation present in the data. This
data emulates the scenario where the correlation changes its direction
after a point. Sometimes that change point is in the middle causing the
linear correlation to be close to zero. Here we show an example when the
change point is off center to show that the implementation works in
non-uniform cases.

We estimate the nonlinear correlation using `nlcor`.

     nlcor(x2, y2, plt = True)
	
	{'cor_estimate': 0.8960923220316748,
     'adjusted_p_value': 0.0,
    'cor_plot': <AxesSubplot:xlabel='x', ylabel='y'>"

![](https://github.com/ProcessMiner/nlcor/blob/master/README_files/figure-markdown_strict/Figure-2.2-1.png?raw=true)

It is visible from the plot that `nlcor` could estimate the piecewise
correlations in a non-uniform scenario. Also, the nonlinear correlation
comes out to be higher than the linear correlation.

### Example 3. Highly noncorrelated data. Typical in multi-seasonal processes.

A data with higher and multiple frequency variations.

    plot(x3, y3)

![](https://github.com/ProcessMiner/nlcor/blob/master/README_files/figure-markdown_strict/Figure-3.1-1.png?raw=true)

The linear correlation of the data is,

	spearmanr(x3,y3)
    #SpearmanrResult(correlation=-0.13826069794395476, pvalue=1.5642663041613067e-18)

The linear correlation is expectedly small, albeit not close to zero due
to some linearity.

Here we show we can refine the granularity of the correlation
computation.

Under default settings, the output of `nlcor` will be,

    nlcor(x3, y3, plt = True)
	'cor_estimate': 0.8545600175677881,
	'adjusted_p_value': 0.004412619725243649,
	'cor_plot': <AxesSubplot:xlabel='x', ylabel='y'>    print(c$cor.plot)

![](https://github.com/ProcessMiner/nlcor/blob/master/README_files/figure-markdown_strict/Figure-3.2-1.png?raw=true)

We can refine the correlation estimation by changing the `refine` parameter. 
It can be set as any 
value between `0` and `1`. A lower value enforces higher refinement. However,
higher refinement adversely affects the p value. Meaning, the resultant 
correlation estimate may be statistically insignificant (similar to overfitting).
Therefore, it is recommended to avoid over refinement.

Typically, the `refine` should be less than `0.20`. In this data, we rerun the correlation estimation with `refine = 0.01`.

    nlcor(x3, y3, refine = 0.01, plt = True)
	'cor_estimate': 0.1337303964709714,
	'adjusted_p_value': 0.0,
	'cor_plot': <AxesSubplot:xlabel='x', ylabel='y'>

![](https://github.com/ProcessMiner/nlcor/blob/master/README_files/figure-markdown_strict/Figure-3.3-1.png?raw=true)


As can be seen in the figure, `nlcor` could identify the granular
piecewise correlations. In this data, the p value still remains
extremely small—the correlation is *statistically significant*.

### Example 4. Line visualization adjustment.

Sometimes we want to change the line thickness and its opacity
(1-transparency).

They can be adjusted with `line_thickness` and `line_opacity` arguments.

    nlcor(x1, y1, plt = True, line_thickness = 2.5, line_opacity = 0.8)

![](https://github.com/ProcessMiner/nlcor/blob/master/README_files/figure-markdown_strict/Figure-3.4-1.png?raw=true)

Installation
-----------

To install the package, type the following command:
	
    pip install nlcor

In order to import the package, type the following command:
    
    from nlcor import nlcor

Summary
-------

This package provides an implementation of an efficient heuristic to
compute the nonlinear correlations between numeric vectors. The
heuristic works by adaptively identifying multiple local regions of
linear correlations to estimate the overall nonlinear correlation. Its
usages are demonstrated here with few examples.

------------------------------------------------------------------------

Support
-------

Devleena Banerjee <dbanerjee@processminer.com>

Mahendra Reddy <mreddy@processminer.com>


Visit <www.processminer.com> for further information.
