Metadata-Version: 2.1
Name: dolvins
Version: 0.0.5
Summary: Dolvin's Math and Stats Library
Author: Landon Dolvin
Author-email: landondolvin@gmail.com
Keywords: python,distributions,probability,linear algebra,statistics,mathematics
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: Unix
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows
Description-Content-Type: text/markdown
License-File: LICENSE.txt


# Dolvins



This project provides a set of functions and classes for optimization, probability, and statistical analysis, with a focus on handling multi-dimensional data, hyperplanes, and distribution analysis.

<br/>



## Table of Contents



- [Installation](#installation)

- [Usage](#usage)

  - [General Math Functions](#general-math-functions)

  - [Hyperplane Object](#hyperplane-object)

  - [Probability and Random Variables Functions](#probability-and-random-variables-functions)

  - [Calculus Functions](#calculus-functions)

  - [Distribution Analysis Functions](#distribution-analysis-functions)

- [Examples](#examples)

- [License](#license)

  <br/>



## Installation



Dolvins is built on the following packages:



- `psutil`

- `numpy`

- `pandas`

- `tqdm`

- `scipy`



To install Dolvins automatically with all its dependencies, please run:



```

pip install dolvins

```



<br/>



## Usage



### General Math Functions



#### `next_power_of_two(x: int) -> int`



Returns the next power of two greater than or equal to `x`.



**Arguments:**



- `x (int)`: The input number.



**Returns:**



- `int`: The next power of two.



**Example:**



```

x = 5

next_power = next_power_of_two(x)

print(next_power)



>> 8

```



<br/>



#### `round_down_to_nearest_power_of_two(x: int) -> int`



Rounds down `x` to the nearest power of two.



**Arguments:**



- `x (int)`: The input number.



**Returns:**



- `int`: The nearest power of two.



**Example:**



```

x = 10

nearest_power = round_down_to_nearest_power_of_two(x)

print(nearest_power)



>> 8

```



<br/>



#### `gcd_of_list(numbers: list) -> int`



Returns the GCD of a list of numbers.



**Arguments:**



- `numbers (list)`: A list of integers.



**Returns:**



- `int`: The GCD of the list.



**Example:**



```

numbers = [12, 15, 21]

gcd_result = gcd_of_list(numbers)

print(gcd_result)



>> 3

```



<br/>



### Mathematical Objects



#### `Hyperplane`



A class representing a hyperplane.



**Methods:**



- `__init__(self, normal: np.array, coef: float)`



  Initializes a Hyperplane object with a normal vector and coefficient.



  **Arguments:**



  - `normal (np.array)`: The normal vector to the hyperplane.

  - `coef (float)`: The coefficient of the hyperplane.



- `project_point(self, *point: float) -> np.array`



  Projects a point onto the hyperplane.



  **Arguments:**



  - `point (float)`: The vector/point to project.



  **Returns:**



  - `np.array`: The projected point.



**Example:**



```

normal = np.array([1, 1, 1])

coef = 3

hyperplane = Hyperplane(normal, coef)

projected_point = hyperplane.project_point(2, 4, 0)

print(projected_point)



>> np.array([1, 2, 0])

```



<br/>



### Probability and Random Variables Functions



#### `sterlings_approximation(n: int) -> float`



Returns an approximation of `n!` using Sterling's approximation.



**Arguments:**



- `n (int)`: The input number.



**Returns:**



- `float`: The approximate factorial of `n`.



**Example:**



```

n = 10

approx_factorial = sterlings_approximation(n)

print(approx_factorial)



>>> 3598695.6187410373

```



<br/>



#### `permutate(n: int, r: int) -> int`



Calculates permutations of `n` objects taken `r` at a time (using Sterling's if `n` is too large)



**Arguments:**



- `n (int)`: Number of objects.

- `r (int)`: Number you are choosing where order matters.



**Returns:**



- `int`: `n` permutate `r`.



**Example:**



```

n = 5

r = 3

perm_result = permutate(n, r)

print(perm_result)



>> 60

```



<br/>



#### `combinate(n: int, r: int) -> int`



Calculates combinations of `n` objects taken `r` at a time where order does not matter.



**Arguments:**



- `n (int)`: Number of objects.

- `r (int)`: Number you are choosing.



**Returns:**



- `int`: `n` combinate `r`.



**Example:**



```

n = 5

r = 3

comb_result = combinate(n, r)

print(comb_result)



>> 10

```



<br/>



#### `discrete_distribution_prob(exp: pd.Series, obs: pd.Series) -> float`



Calculates the exact probability of observing the observed distribution given the expected distribution. **Note:** scale does not matter (i.e., the sum of `obs` vs. the sum of `exp` does not matter as the `exp` is converted to a probability)



**Arguments:**



- `exp (pd.Series)`: The ground truth (expected) distribution.

- `obs (pd.Series)`: The observed distribution.



**Returns:**



- `float`: The probability of observing the distribution.



**Example:**



```

exp = pd.Series([50, 50, 50])

obs = pd.Series([2, 1, 2])

prob = discrete_distribution_prob(exp, obs)

print(prob)



>>> 0.1234

```



<br/>



#### `generate_combinations(num_classes: int, num_obs: int) -> set`



Returns a set of all possible combinations of `num_classes` integers that add up to `num_obs`.



**Arguments:**



- `num_classes (int)`: Number of classes to choose from.

- `num_obs (int)`: Total number the classes should sum.



**Returns:**



- `set`: The set of all possible combinations.



**Example:**



```

num_classes = 2

num_obs = 4

combinations = generate_combinations(num_classes, num_obs)

print(combinations)



>> {(0, 4), (1, 3), (2, 2), (3, 1), (4, 0)}

```



<br/>



#### `generate_normal_exponent(mean: float, std_dev: float) -> Callable`



Generates a function representing the exponent of a normal distribution with the specified mean and standard deviation.



**Arguments:**



- `mean (float)`: Mean (mu) of the normal distribution.

- `std_dev (float)`: Standard deviation (sigma) of the normal distribution.



**Returns:**



- `Callable`: A function representing the exponent.



**Example:**



```

mean = 0

std_dev = 1

normal_exp = generate_normal_exponent(mean, std_dev)

```



`normal_exp` = the functional equivalent to $- \frac{1}{2} \cdot (\frac{x - \mu}{\sigma})^2$ where $\mu$ = `mean` and $\sigma$ = `std_dev`

<br/>



#### `generate_joint_pdf(exp: pd.Series, num_obs: int) -> Callable`



Generates a joint probability density function (PDF) for all possible outcomes based on the expected distribution and the total number of observations.



**Arguments:**



- `exp (pd.Series)`: The ground truth (expected) distribution.

- `num_obs (int)`: The number of observations.



**Returns:**



- `Callable`: The joint PDF function.



**Explanation:**



1. Approximates each classes distribution with a Normal PDF

2. Multiplies each classes approximation to get a Joint PDF



**Example:**



```

exp = pd.Series([4, 6])

num_obs = 100

joint_pdf = generate_joint_pdf(exp, num_obs)

```



`joint_pdf` = the functional equivalent to $\frac{1}{\sqrt(2\cdot\pi\cdot40\cdot\frac{6}{10})\sqrt(2\cdot\pi\cdot60\cdot\frac{4}{10})} \cdot e^{- \frac{1}{2} \cdot (\frac{x - 40}{\sqrt(40\cdot\frac{6}{10}})^2 - \frac{1}{2} \cdot (\frac{y - 60}{\sqrt(60\cdot\frac{4}{10}})^2}$

<br/>



### Calculus Functions



#### `hyperplane_integration(f: Callable, hyperplane: list, max_val: float = None, chunk_size: int = "auto", num_samples: int = "auto", random_state: int = 42, pbar: Callable = None) -> float`



Integrates the PDF over an N-d hyperplane using quasi-Monte Carlo integration (Sobol sampling) - Currently only supports integration in the positive quadrant.



**Arguments:**



- `f (Callable)`: The function to integrate.

- `hyperplane (object)`: The hyperplane over which to integrate.

- `max_val (float)`: The max value at which to cap integration (defaulted to None) - any region in which the function goes beyond that value is not counted.

- `chunk_size (int)`: The amount of samples to handle at one time (defaulted to auto).

- `random_state (int)`: Random state to use to ensure the integration is deterministic.

- `pbar (tqdm)`: Progress bar to update with every chunk completed (defaulted to None)



**Returns:**



- `float`: The result of integration.



**Example:**



```

f = lambda x, y, z: x + y + z

hyperplane = Hyperplane(normal=np.array([1, 1, 1]), coef=3)

result = hyperplane_integration(f, hyperplane)

print(result)



>> 13.5

```



<br/>



### Distribution Analysis Functions



#### `E(exp: pd.Series, obs: pd.Series, approximate: bool, chunk_size: int = "auto", num_samples: int = "auto", random_state: int = None) -> float`



Performs an E-test on an expected distribution and observed distribution.



**Arguments:**



- `exp (pd.Series)`: The expected (ground-truth) distribution.

- `obs (pd.Series)`: The observed distribution.

- `approximate (bool)`: If False, the exact discrete probability is calculated; if True, an approximate is calculated based on continuous probability.

- `chunk_size (int)`: The amount of samples to do simultaneously (defaulted to "auto").

- `num_samples (int)`: The number of samples to calculate in total - lower is faster but less precise.

- `random_state (int)`: If specified, leads to deterministic results.



**Returns:**



- `float`: The E-value.



**Explanation:**



- The E-test seeks to generate a more interpretable and accurate probability value (p-value) for testing the statistical difference between two distributions

- The E-test assumes the expected and observed distributions are identical, and under those assumptions, calculates an E-value which is the probability of receiving a distribution more **E**xtreme or as **E**xtreme than that which has been observed.

- Thus, the lower the E-value (i.e., the lower the chances of receiving a distribution that extreme if the distributions were in fact identical), the greater the indication that the distributions are different

- The exact E-value can be calculated using discrete probability, however, an continuous probability estimate must be calculated in cases where there are many observations

- Note: time complexity in either case is exponential so while continuous can approximate larger observations, it may take a significant amount of time for massive samples without some method of scaling them down (to be researched)



**Example:**



```

exp = pd.Series([50, 50, 50])

obs = pd.Series([300, 300, 300])

e_value = E(exp, obs, approximate=True)

print(e_value)



>> 1.0





exp = pd.Series([50, 0, 0])

obs = pd.Series([100, 0, 0])

e_value = E(exp, obs, approximate=True)

print(e_value)



>> 0





exp = pd.Series([15, 15, 15])

obs = pd.Series([155, 145, 150])

e_value = E(exp, obs, approximate=True)

print(e_value)



>> 0.77743

```



<br/><br/>



## License



This project is licensed under the MIT License.



This README file provides detailed documentation for each function and class, including arguments, return values, and example usage. You can adjust the details based on your specific project and needs.



> Written with [StackEdit](https://stackedit.io/).

