Metadata-Version: 2.1
Name: datalib_genta
Version: 0.1.2
Summary: A Python library for data manipulation, statistical analysis, and machine learning.
Author: Ibrahim Trabelsi
Author-email: Trabelsi_Ibrahim@hotmail.com
License: MIT
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE

Here’s how we can enhance the `README` for the `DataLib` project with some usage examples for each feature, so users can easily understand how to use the library's functions.

---

# DataLib

DataLib is a Python library for data manipulation, statistical analysis, and machine learning.

## Features
- **Data preprocessing** (normalization, handling missing values)
- **Statistical calculations** (mean, median, mode)
- **Data visualization** (histograms, scatter plots)
- **Regression and classification analysis** (linear regression, k-means clustering, PCA)

## Installation

To install `DataLib`, use the following command:

```bash
pip install datalib_genta
```

## Usage

### 1. Data Preprocessing

**Normalize Data:**

Normalize your data to scale features to a range.

```python
from datalib.preprocessing.transformations import normalize_data
import numpy as np

# Sample data
X = np.array([[1, 2], [3, 4], [5, 6]])

# Normalize the data
X_normalized = normalize_data(X)
print(X_normalized)
```

**Handle Missing Data:**

Fill missing values in the dataset.

```python
from datalib.preprocessing.transformations import handle_missing_values
import numpy as np

# Sample data with missing values
X = np.array([[1, 2], [np.nan, 4], [5, np.nan]])

# Fill missing values with column mean
X_filled = handle_missing_values(X, method='mean')
print(X_filled)
```

### 2. Statistical Calculations

**Calculate Mean, Median, and Mode:**

Calculate basic statistics of a dataset.

```python
from datalib.statistics.basic_stats import calculate_mean, calculate_median, calculate_mode
import numpy as np

data = np.array([1, 2, 3, 4, 5])

# Mean
mean = calculate_mean(data)
print(f"Mean: {mean}")

# Median
median = calculate_median(data)
print(f"Median: {median}")

# Mode
mode = calculate_mode(data)
print(f"Mode: {mode}")
```

**Statistical Tests:**

Perform basic statistical tests like t-tests and chi-square tests.

```python
from datalib.statistics.statistical_tests import t_test
import numpy as np

# Sample data
group1 = np.array([1, 2, 3, 4])
group2 = np.array([5, 6, 7, 8])

# Perform a t-test
t_stat, p_value = t_test(group1, group2)
print(f"T-statistic: {t_stat}, P-value: {p_value}")
```

### 3. Data Visualization

**Plot Histograms:**

Visualize the distribution of data using histograms.

```python
from datalib.visualization.charts import plot_histogram
import numpy as np

# Sample data
data = np.array([1, 2, 2, 3, 3, 3, 4, 5])

# Plot histogram
plot_histogram(data, bins=5)
```

**Plot Scatter Plots:**

Create scatter plots to visualize relationships between variables.

```python
from datalib.visualization.charts import plot_scatter
import numpy as np

# Sample data
X = np.array([1, 2, 3, 4])
y = np.array([10, 20, 30, 40])

# Plot scatter plot
plot_scatter(X, y)
```

### 4. Regression and Classification

**K-Means Clustering:**

Perform K-Means clustering on a dataset.

```python
from datalib.analysis.clustering import k_means_clustering
import numpy as np

# Sample data
X = np.array([[0.1, 0.2], [0.3, 0.4], [0.5, 0.6], [0.7, 0.8], [0.9, 1.]])

# Perform K-Means clustering
model = k_means_clustering(X, 2)

# Cluster centers
print("Cluster centers:", model.cluster_centers_)
```

**Principal Component Analysis (PCA):**

Reduce the dimensionality of data using PCA.

```python
from datalib.analysis.clustering import pca_analysis
import numpy as np

# Sample data
X = np.array([[0.1, 0.2], [0.3, 0.4], [0.5, 0.6], [0.7, 0.8], [0.9, 1.]])

# Perform PCA
X_pca, variance_ratio = pca_analysis(X, 1)

# Results
print("Reduced Data:", X_pca)
print("Variance Ratio:", variance_ratio)
```

**Linear Regression:**

Perform linear regression on a dataset.

```python
from datalib.analysis.clustering import linear_regression
import numpy as np

# Sample data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([1, 2, 3, 4, 5])

# Perform linear regression
model = linear_regression(X, y)

# Predictions
predictions = model.predict(X)
print(f"Predictions: {predictions}")
```

---

This updated `README` provides clear examples on how to use various functions within the `DataLib` library, helping users get started quickly. You can adapt this further based on your users' needs.
