Metadata-Version: 2.3
Name: ml_scheduler
Version: 1.1.0
Summary: A lightweight machine learning experiment scheduler that automates resource management (e.g., GPUs and models) and batch runs experiments with just a few lines of Python code.
Author-email: Yiwen Hu <1020030101@qq.com>
License-Expression: MIT
License-File: LICENSE
Keywords: artificial intelligence,async,large language model,machine learning,scheduler
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Requires-Dist: coloredlogs
Requires-Dist: nvitop
Requires-Dist: pandas
Requires-Dist: typing-extensions
Provides-Extra: dev
Requires-Dist: isort>=5.3; extra == 'dev'
Requires-Dist: pytest>=5.0; extra == 'dev'
Description-Content-Type: text/markdown

# ml_scheduler

[![PyPI version](https://badge.fury.io/py/ml-scheduler.svg)](http://badge.fury.io/py/ml-scheduler)
[![License](https://img.shields.io/github/license/mashape/apistatus.svg)](https://pypi.python.org/pypi/ml_scheduler/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://timothycrosley.github.io/isort/)
<!--[![Test Status](https://github.com/huyiwen/ml_scheduler/workflows/Test/badge.svg?branch=develop)](https://github.com/huyiwen/ml_scheduler/actions?query=workflow%3ATest)
[![Lint Status](https://github.com/huyiwen/ml_scheduler/workflows/Lint/badge.svg?branch=develop)](https://github.com/huyiwen/ml_scheduler/actions?query=workflow%3ALint)
[![codecov](https://codecov.io/gh/huyiwen/ml_scheduler/branch/main/graph/badge.svg)](https://codecov.io/gh/huyiwen/ml_scheduler)
[![Join the chat at https://gitter.im/huyiwen/ml_scheduler](https://badges.gitter.im/huyiwen/ml_scheduler.svg)](https://gitter.im/huyiwen/ml_scheduler?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
[![Downloads](https://pepy.tech/badge/ml_scheduler)](https://pepy.tech/project/ml_scheduler)-->


[**ML Scheduler**](https://github.com/huyiwen/ml_scheduler/) is a lightweight machine learning experiment scheduler that automates resource management (e.g., GPUs and models) and batch runs experiments with just a few lines of Python code.

## Quick Start

1. Install ml-scheduler

```bash
pip install ml-scheduler
```

or install from the github repository:

```bash
git clone https://github.com/huyiwen/ml_scheduler
cd ml_scheduler
pip install -e .
```

2. Create a Python script:

```python
cuda = ml_scheduler.pools.CUDAPool([0, 2], 90)
disk = ml_scheduler.pools.DiskPool('/one-fs')


@ml_scheduler.exp_func
async def mmlu(exp: ml_scheduler.Exp, model, checkpoint):

    source_dir = f"/another-fs/model/{model}/checkpoint-{checkpoint}"
    target_dir = f"/one-fs/model/{model}-{checkpoint}"

    # resources will be cleaned up after exiting the function
    disk_resource = await exp.get(
        disk.copy_folder,
        source_dir,
        target_dir,
        cleanup_target=True,
    )
    cuda_resource = await exp.get(cuda.allocate, 1)

    # run inference
    args = [
        "python", "inference.py", "--model", target_dir, "--dataset", "mmlu", "--cuda",  str(cuda_resource[0])
    ]
    stdout = await exp.run(args=args)
    await exp.report({'Accuracy', stdout})


mmlu.run_csv("experiments.csv", ['Accuracy'])
```

Mark the function with `@ml_scheduler.exp_func` and `async` to make it an experiment function. The function should take an `exp` argument as the first argument.

Then use `await exp.get` to get resources (non-blocking) and `await exp.run` to run the experiment (also non-blocking). Non-blocking means that when you can run multiple experiments concurrently.

3. Create a CSV file `experiments.csv` with your arguments (`model` and `checkpoint` in this case):

```csv
model,checkpoint
alpacaflan-packing,200
alpacaflan-packing,400
alpacaflan-qlora,200-merged
alpacaflan-qlora,400-merged
```

4. Run the script:

```bash
python run.py
```

The results (`Accuracy` in this case) and some other information will be saved in `results.csv`.

## More Examples

- [Copy and run](/examples/copy_and_run)
