Metadata-Version: 2.4
Name: mlsplitter
Version: 1.0.0
Summary: Convenient helpers for splitting DataFrames into features/target and creating train/dev/test splits.
License: MIT
Project-URL: Homepage, https://github.com/Fares-Ayman-1/mlsplitter.git
Keywords: machine learning,data splitting,sklearn,train test split
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.21
Requires-Dist: pandas>=1.3
Requires-Dist: scikit-learn>=1.0
Provides-Extra: dev
Requires-Dist: pytest>=7; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Dynamic: license-file

# mlsplitter

A small, well-tested Python package that provides two conveniences on top of
scikit-learn:

1. **`x_y_mlsplitter`** – split a DataFrame into feature matrix *X* and target
   vector *y* by column name or index.
2. **`train_test_mlsplitter`** – thin validated wrapper around
   `sklearn.model_selection.train_test_split`.
3. **`train_dev_test_mlsplitter`** – split data into three sets (train / dev /
   test) with sizes expressed as fractions of the **full** dataset.

## Installation

```bash
pip install mlsplitter
```

Or from source:
```bash
git clone https://github.com/Fares-Ayman-1/mlsplitter.git
cd mlsplitter
pip install -e ".[dev]"
```

## Quick start

```python
import pandas as pd
from mlsplitter import x_y_splitter, train_test_splitter, train_dev_test_splitter

df = pd.read_csv("my_data.csv")

# Split features from target (by name or by position)
X, y = x_y_splitter(df, column_name="price")
X, y = x_y_splitter(df, column_index=-1)

# Train / test split
x_train, x_test, y_train, y_test = train_test_splitter(X, y, test_size=0.2)

# Train / dev / test split
x_train, x_dev, x_test, y_train, y_dev, y_test = train_dev_test_splitter(
    X, y, dev_size=0.1, test_size=0.2
)
```

## Running tests

```bash
pytest --cov=mlsplitter
```

## License

MIT
