Metadata-Version: 2.1
Name: correlation
Version: 1.0.0
Summary: Calculate the confidence intervals of correlation coeficients
Home-page: https://github.com/XiangwenWang/correlation
Author: Xiangwen Wang
Author-email: wangxiangwen1989@gmail.com
License: BSD 2-Clause License
Keywords: correlation,confidence interval
Platform: any
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Description-Content-Type: text/markdown
Requires-Dist: numpy
Requires-Dist: scipy

# correlation

Calculate confidence intervals for correlation coefficients, including Pearson's R, Kendall's tau, Spearman's rho, and customized correlation measures.

## Methodology  
Two approaches are offered to calculate the confidence intervals, one parametric approach based on normal approximation, and one non-parametric approach based on bootstrapping.
### Parametric Approach
Say r\_hat is the correlation we obtained, then with a transformation  
```
z = ln((1+r)/(1-r))/2,
```  
z would approximately follow a normal distribution,  
with a mean equals to z(r\_hat),  
and a variance sigma^2 that equals to 1/(n-3), 0.437/(n-4), (1+r_hat^2/2)/(n-3) for the Pearson's r, Kendall's tau, and Spearman's rho, respectively (read Ref. [1, 2] for more details). n is the array length.

The (1-alpha) CI for r would be  
```
(T(z_lower), T(z_upper))
```  
where T is the inverse of the transformation mentioned earlier  
```
T(x) = (exp(2x) - 1) / (exp(2x) + 1),
```   
```
z_lower = z - z_(1-alpha/2) sigma,
```  
```
z_upper = z + z_(1-alpha/2) sigma.
```

This normal approximation works when the absolute values of the Pearson's r, Kendall's tau, and Spearman's rho are less than 1, 0.8, and 0.95, respectively.

### Nonparametric Approach
For the nonparametric approach, we simply adopt a naive bootstrap method.

* We sample a pair (x\_i, y\_i) with replacement from the original (paired) samples until we have a sample size that equals to n, and calculate a correlation coefficient from the new samples.  
* Repeat this process for a large number of times (by default we use 5000),
* then we could obtain the (1-alpha) CI for r by taking the alpha/2 and (1-alpha/2) quantiles of the obtained correlation coefficients.


## References
[1] Bonett, Douglas G., and Thomas A. Wright. "Sample size requirements for estimating Pearson, Kendall and Spearman correlations." Psychometrika 65, no. 1 (2000): 23-28.  
[2] Bishara, Anthony J., and James B. Hittner. "Confidence intervals for correlations when data are not normal." Behavior research methods 49, no. 1 (2017): 294-309.


## Installation:  
```
pip install correlation
```  
or

```
conda install -c wangxiangwen correlation
```

## Example Usage:  
```python
>>> import correlation
>>> a, b = list(range(2000)), list(range(200, 0, -1)) * 10
>>> correlation.corr(a, b, method='spearman_rho')
(-0.0999987624920335,          # correlation coefficient
 -0.14330929583811683,         # lower endpoint of CI
 -0.056305939127336606,        # upper endpoint of CI
 7.446171861744971e-06)        # p-value
```


