Metadata-Version: 2.4
Name: batwing-ml
Version: 0.1.0
Summary: Batwing ML: A Functional machine learning library for fast, visual, and parameter-driven modeling.
Home-page: https://github.com/Harshithan07/batwing-ml
Author: Harshithan Kavitha Sukumar
Author-email: harshithan.ks2002@gmail.com
Project-URL: Documentation, https://github.com/Harshithan07/batwing-ml#readme
Project-URL: Source, https://github.com/Harshithan07/batwing-ml
Project-URL: Tracker, https://github.com/Harshithan07/batwing-ml/issues
Keywords: machine-learning classification regression preprocessing evaluation AutoML
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas
Requires-Dist: numpy
Requires-Dist: scipy
Requires-Dist: scikit-learn>=1.0
Requires-Dist: optuna>=3.0
Requires-Dist: matplotlib
Requires-Dist: seaborn
Requires-Dist: tabulate
Requires-Dist: rich
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license-file
Dynamic: project-url
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# Batwing ML Library

**Modular, functional, and interpretable machine learning pipeline** for classification, multiclass, and regression tasks. Designed for data scientists who want rapid experimentation, clean diagnostics, and powerful model comparisons — all with minimal code.

---

## 🚀 Features

* Full EDA and column-level diagnosis
* Modular preprocessing (impute, encode, scale)
* Feature engineering with PCA and correlation filtering
* Hyperparameter tuning with Optuna (classification, regression, multiclass)
* Nested Cross-Validation for robust model benchmarking
* Rich model evaluation (metrics + visualizations)
* Supports cost-sensitive classification and diagnostics
* Dashboard/notebook-friendly outputs (HTML/tabulate/rich)

---

## 📦 Installation

> Coming soon to PyPI

For now, clone the repo and import functions directly:

```bash
git clone https://github.com/your-org/batwing-ml.git
```

```python
from batwing_ml import (
    summary_dataframe,
    preprocess_dataframe,
    run_nested_cv_classification,
    evaluate_classification_model,
    ...
)
```

---

## 🧠 Module Overview

| Module                                               | Key Functions                                                                 |
| ---------------------------------------------------- | ----------------------------------------------------------------------------- |
| `exploratory.py`                                     | `summary_dataframe()`, `summary_column()` – full EDA, missing patterns, plots |
| `data_validation_and_etl.py`                         | Data shape, type, duplication checks                                          |
| `data_preparation.py`                                | Label transformation, type casting, etc.                                      |
| `feature_engineering.py`                             | PCA, correlation pruning, importance plots                                    |
| `preprocessor.py`                                    | `preprocess_dataframe()`, `preprocess_column()` – encode, scale, impute       |
| `hyperparameter_tuning_classification.py`            | Optuna tuning for binary classification                                       |
| `run_nested_cv_classification.py`                    | Nested CV with model benchmarking                                             |
| `evaluate_classification_model.py`                   | Confusion matrix, ROC, cost-sensitive plots                                   |
| `hyperparameter_tuning_multiclass_classification.py` | Multiclass Optuna tuning                                                      |
| `nested_cv_multiclass_classification.py`             | Nested CV for multiclass tasks                                                |
| `evaluate_multiclass_classification.py`              | Precision, recall, per-class analysis                                         |
| `hyperparameter_tuning_regression.py`                | Optuna tuning for regression                                                  |
| `nested_cv_regression.py`                            | Nested CV for regression models                                               |
| `evaluate_regression_model.py`                       | Regression metrics + diagnostic plots                                         |

---

## 🔧 Usage Examples

### 📊 1. Data Summary

```python
summary_dataframe(df, verbose=True, detailing=True, correlation_matrix=True)
summary_column(df, "age", plots=["histogram", "missing_trend"])
```

### ⚙️ 2. Preprocessing

```python
X_proc, y_proc, steps = preprocess_dataframe(
    df, target_col="target",
    impute=True, encode="onehot", scale="standard",
    return_steps=True
)
```

### 🔁 3. Model Tuning (Binary Classification)

```python
from sklearn.ensemble import RandomForestClassifier
from batwing_ml import hyperparameter_tuning_classification

model_class = RandomForestClassifier
param_grid = {
    'n_estimators': lambda trial: trial.suggest_int("n_estimators", 50, 200),
    'max_depth': lambda trial: trial.suggest_int("max_depth", 3, 10)
}

results = hyperparameter_tuning_classification(
    model_class=model_class,
    param_grid=param_grid,
    X=X, y=y,
    scoring='roc_auc'
)
```

### 🏁 4. Nested Cross-Validation (Regression)

```python
models = {
    "ridge": Ridge(),
    "rf": RandomForestRegressor()
}
param_grids = {
    "ridge": {"alpha": [0.1, 1.0, 10]},
    "rf": {"n_estimators": [100, 200], "max_depth": [3, 5]}
}

run_nested_cv_regression(
    X=X, y=y,
    model_dict=models,
    param_grids=param_grids,
    scoring_list=["r2", "rmse", "mae"],
    search_method="grid",
    return_results=True
)
```

---

## 📈 Visualizations

* Feature Importance
* Correlation Heatmaps
* PCA Scree and Scatter Plots
* Confusion Matrix, ROC, Threshold Plots
* Learning Curve, Residuals, Prediction vs Actual
* Lift Charts, Cost-Sensitive Curves

---

## 📚 License

MIT License

---

## 👥 Contributors

Built by \[Your Name] and contributors.

---

## 💡 Future Additions

* AutoML wrappers
* MLflow integration
* HTML dashboard export
* Time series module

---
