Metadata-Version: 2.4
Name: mlbench-lite
Version: 0.1.0
Summary: A simple machine learning benchmarking library
Author-email: Your Name <your.email@example.com>
Maintainer-email: Your Name <your.email@example.com>
License: MIT
Project-URL: Homepage, https://github.com/yourusername/mlbench-lite
Project-URL: Documentation, https://github.com/yourusername/mlbench-lite#readme
Project-URL: Repository, https://github.com/yourusername/mlbench-lite.git
Project-URL: Bug Tracker, https://github.com/yourusername/mlbench-lite/issues
Keywords: machine learning,benchmarking,scikit-learn,ml
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: scikit-learn>=1.0.0
Requires-Dist: pandas>=1.3.0
Requires-Dist: numpy>=1.20.0
Provides-Extra: dev
Requires-Dist: pytest>=6.0; extra == "dev"
Requires-Dist: pytest-cov>=2.0; extra == "dev"
Requires-Dist: black>=21.0; extra == "dev"
Requires-Dist: flake8>=3.8; extra == "dev"
Requires-Dist: mypy>=0.800; extra == "dev"
Provides-Extra: test
Requires-Dist: pytest>=6.0; extra == "test"
Requires-Dist: pytest-cov>=2.0; extra == "test"
Dynamic: license-file

# mlbench-lite

A simple machine learning benchmarking library that provides an easy way to compare multiple ML models on your dataset. Built with scikit-learn and pandas for seamless integration into your ML workflow.

## 🚀 Features

- **Simple API**: One function call to benchmark multiple models
- **Built-in Models**: Includes Logistic Regression, Random Forest, and SVM
- **Comprehensive Metrics**: Returns Accuracy, Precision, Recall, and F1 scores
- **Custom Dataset**: Includes the `load_clover` dataset for testing
- **Easy Integration**: Works seamlessly with scikit-learn datasets
- **Pandas Output**: Results returned as a clean pandas DataFrame
- **Reproducible**: Consistent results with random state control

## 📦 Installation

```bash
pip install mlbench-lite
```

## 🎯 Quick Start

```python
from mlbench_lite import benchmark, load_clover

# Load the clover dataset
X, y = load_clover(return_X_y=True)

# Benchmark multiple models
results = benchmark(X, y)
print(results)
```

**Output:**
```
                 Model  Accuracy  Precision  Recall      F1
0        Random Forest    0.9500     0.9565  0.9512  0.9505
1                  SVM    0.9250     0.9337  0.9255  0.9254
2  Logistic Regression    0.9125     0.9131  0.9117  0.9115
```

## 📚 API Reference

### `benchmark(X, y, test_size=0.2, random_state=42)`

Benchmark multiple machine learning models on a dataset.

**Parameters:**
- `X` (array-like): Training vectors of shape (n_samples, n_features)
- `y` (array-like): Target values of shape (n_samples,)
- `test_size` (float, optional): Proportion of dataset for testing (default: 0.2)
- `random_state` (int, optional): Random seed for reproducibility (default: 42)

**Returns:**
- `pandas.DataFrame`: Results with columns:
  - `Model`: Name of the model
  - `Accuracy`: Accuracy score
  - `Precision`: Precision score (macro-averaged)
  - `Recall`: Recall score (macro-averaged)
  - `F1`: F1 score (macro-averaged)

### `load_clover(return_X_y=False)`

Load the custom clover dataset.

**Parameters:**
- `return_X_y` (bool, default=False): If True, returns (data, target) instead of a Bunch object

**Returns:**
- `Bunch` or `tuple`: Dataset object with data, target, feature_names, target_names, and DESCR

## 💡 Code Examples

### 1. Basic Usage with Clover Dataset

```python
from mlbench_lite import benchmark, load_clover

# Load the clover dataset
X, y = load_clover(return_X_y=True)
print(f"Dataset shape: {X.shape}")
print(f"Number of classes: {len(set(y))}")

# Benchmark models
results = benchmark(X, y)
print("\nBenchmark Results:")
print(results)

# Get the best model
best_model = results.iloc[0]
print(f"\n🏆 Best Model: {best_model['Model']} (Accuracy: {best_model['Accuracy']:.4f})")
```

### 2. Using with Scikit-learn Datasets

```python
from mlbench_lite import benchmark
from sklearn.datasets import load_wine, load_breast_cancer

# Test with Wine dataset
print("=== Wine Dataset ===")
X, y = load_wine(return_X_y=True)
results = benchmark(X, y)
print(results)

# Test with Breast Cancer dataset
print("\n=== Breast Cancer Dataset ===")
X, y = load_breast_cancer(return_X_y=True)
results = benchmark(X, y)
print(results)
```

### 3. Custom Test Size

```python
from mlbench_lite import benchmark, load_clover

X, y = load_clover(return_X_y=True)

# Use 30% of data for testing
results = benchmark(X, y, test_size=0.3)
print("Results with 30% test size:")
print(results)

# Use 10% of data for testing
results = benchmark(X, y, test_size=0.1)
print("\nResults with 10% test size:")
print(results)
```

### 4. Reproducible Results

```python
from mlbench_lite import benchmark, load_clover

X, y = load_clover(return_X_y=True)

# Set random seed for reproducible results
results1 = benchmark(X, y, random_state=123)
results2 = benchmark(X, y, random_state=123)

print("Results with random_state=123:")
print(results1)
print(f"\nResults are identical: {results1.equals(results2)}")

# Different random state produces different results
results3 = benchmark(X, y, random_state=456)
print(f"\nDifferent random state produces different results: {not results1.equals(results3)}")
```

### 5. Working with Synthetic Data

```python
from mlbench_lite import benchmark
from sklearn.datasets import make_classification

# Create synthetic dataset
X, y = make_classification(
    n_samples=1000,
    n_features=20,
    n_informative=15,
    n_classes=4,
    random_state=42
)

print(f"Synthetic dataset shape: {X.shape}")
print(f"Number of classes: {len(set(y))}")

results = benchmark(X, y)
print("\nBenchmark Results:")
print(results)
```

### 6. Analyzing Results

```python
from mlbench_lite import benchmark, load_clover
import pandas as pd

X, y = load_clover(return_X_y=True)
results = benchmark(X, y)

# Display results with better formatting
print("Detailed Results:")
print("=" * 60)
for idx, row in results.iterrows():
    print(f"{row['Model']:20} | Acc: {row['Accuracy']:.4f} | "
          f"Prec: {row['Precision']:.4f} | Rec: {row['Recall']:.4f} | "
          f"F1: {row['F1']:.4f}")

# Find models with accuracy > 0.9
high_accuracy = results[results['Accuracy'] > 0.9]
print(f"\nModels with accuracy > 0.9: {len(high_accuracy)}")

# Calculate average metrics
avg_metrics = results[['Accuracy', 'Precision', 'Recall', 'F1']].mean()
print(f"\nAverage metrics across all models:")
for metric, value in avg_metrics.items():
    print(f"  {metric}: {value:.4f}")
```

### 7. Comparing Different Datasets

```python
from mlbench_lite import benchmark, load_clover
from sklearn.datasets import load_wine, load_breast_cancer

datasets = [
    ("Clover", load_clover(return_X_y=True)),
    ("Wine", load_wine(return_X_y=True)),
    ("Breast Cancer", load_breast_cancer(return_X_y=True))
]

print("Dataset Comparison:")
print("=" * 80)

for name, (X, y) in datasets:
    print(f"\n{name} Dataset:")
    print(f"  Shape: {X.shape}, Classes: {len(set(y))}")
    
    results = benchmark(X, y)
    best_acc = results.iloc[0]['Accuracy']
    best_model = results.iloc[0]['Model']
    
    print(f"  Best Model: {best_model} (Accuracy: {best_acc:.4f})")
    
    # Show top 2 models
    print("  Top 2 Models:")
    for idx, row in results.head(2).iterrows():
        print(f"    {row['Model']}: {row['Accuracy']:.4f}")
```

## 🔬 Models Included

The library benchmarks the following models by default:

1. **Logistic Regression**: Linear model for classification
   - Uses default scikit-learn parameters
   - Good for linear relationships

2. **Random Forest**: Ensemble of decision trees
   - Uses default scikit-learn parameters
   - Good for non-linear relationships and feature importance

3. **SVM**: Support Vector Machine with RBF kernel
   - Uses default scikit-learn parameters
   - Good for complex decision boundaries

All models use their default scikit-learn parameters with appropriate random seeds for reproducibility.

## 📊 Clover Dataset Details

The `load_clover` function provides a custom synthetic dataset:

- **Samples**: 400
- **Features**: 4
- **Classes**: 4

**Features:**
- `leaf_length`: Length of the leaf in cm
- `leaf_width`: Width of the leaf in cm
- `petiole_length`: Length of the petiole in cm
- `leaflet_count`: Number of leaflets per leaf

**Classes:**
- `white_clover`: Trifolium repens
- `red_clover`: Trifolium pratense
- `crimson_clover`: Trifolium incarnatum
- `alsike_clover`: Trifolium hybridum

## 🛠️ Requirements

- Python >= 3.8
- scikit-learn >= 1.0.0
- pandas >= 1.3.0
- numpy >= 1.20.0

## 🧪 Testing

Run the test suite to verify everything works:

```bash
# Run all tests
python -m pytest tests/ -v

# Run with coverage
python -m pytest tests/ --cov=mlbench_lite

# Quick functionality test
python -c "from mlbench_lite import benchmark, load_clover; X, y = load_clover(return_X_y=True); results = benchmark(X, y); print(results)"
```

## 🚀 Development

### Setup Development Environment

```bash
git clone https://github.com/yourusername/mlbench-lite.git
cd mlbench-lite
pip install -e ".[dev]"
```

### Code Quality

```bash
# Format code
black mlbench_lite tests

# Lint code
flake8 mlbench_lite tests

# Type checking
mypy mlbench_lite
```

### Building for Distribution

```bash
# Build package
python -m build

# Upload to PyPI
twine upload dist/*
```

## 🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 📈 Changelog

### 0.1.0 (2024-01-XX)
- Initial release
- Basic benchmarking functionality
- Support for Logistic Regression, Random Forest, and SVM
- Comprehensive metrics (Accuracy, Precision, Recall, F1)
- Custom clover dataset
- Full test coverage
- PyPI ready

## 🆘 Support

If you encounter any issues or have questions:

1. Check the [Issues](https://github.com/yourusername/mlbench-lite/issues) page
2. Create a new issue with detailed information
3. Include code examples and error messages

## 🙏 Acknowledgments

- Built with [scikit-learn](https://scikit-learn.org/)
- Uses [pandas](https://pandas.pydata.org/) for data handling
- Inspired by the need for simple ML benchmarking tools
