Metadata-Version: 2.1
Name: openpo
Version: 0.3.0
Summary: Build high quality synthetic datasets with AI feedback from 200+ LLMs
Home-page: https://github.com/dannylee1020/openpo
License: Apache-2.0
Keywords: llm,finetuning,ai,rlaif,preference tuning,synthetic data generation,synthetic data
Author: Daniel Lee
Author-email: dannylee1020@gmail.com
Requires-Python: >=3.10.1
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: aiohttp (>=3.11.7,<4.0.0)
Requires-Dist: boto3 (>=1.35.57,<2.0.0)
Requires-Dist: httpx (>=0.27.2,<0.28.0)
Requires-Dist: huggingface-hub (>=0.26.2,<0.27.0)
Requires-Dist: numpy (>=2.1.3,<3.0.0)
Requires-Dist: pydantic (>=2.9.2,<3.0.0)
Project-URL: Documentation, https://docs.openpo.dev
Project-URL: Repository, https://github.com/dannylee1020/openpo
Description-Content-Type: text/markdown

# OpenPO 🐼
[![PyPI version](https://img.shields.io/pypi/v/openpo.svg)](https://pypi.org/project/openpo/)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Documentation](https://img.shields.io/badge/docs-docs.openpo.dev-blue)](https://docs.openpo.dev)
![Python](https://img.shields.io/badge/python->=3.10.1-blue.svg)


OpenPO simplifies building synthetic datasets for preference tuning from 200+ LLMs.

## What is OpenPO?
OpenPO is an open source library that simplifies the process of building synthetic datasets for LLM preference tuning. By collecting outputs from 200 + LLMs and ranking them using various techniques, OpenPO helps developers build better, more fine-tuned language models with minimal effort.

## Key Features

- 🔌 **Multiple LLM Support**: Call 200+ models from HuggingFace and OpenRouter

- 🧪 **Research-Backed Methodologies**: Implementation of various methodologies on data synthesis from latest research papers. (feature coming soon!)

- 🤝 **OpenAI API Compatibility**: Fully support OpenAI API format

- 💾 **Flexible Storage:** Out of the box storage providers for Hugging Face and S3.


## Installation
### Install from PyPI (recommended)
OpenPO uses pip for installation. Run the following command in the terminal to install OpenPO:

```bash
pip install openpo
```

### Install from source
Clone the repository first then run the follow command
```bash
cd openpo
poetry install
```

## Getting Started
OpenPO defaults to Hugging Face when provider argument is not set.

```python
import os
from openpo.client import OpenPO

client = OpenPO(api_key="your-huggingface-api-key") # no need to pass in the key if environment variable is already set.

response = client.completions(
    models = [
        "Qwen/Qwen2.5-Coder-32B-Instruct",
        "mistralai/Mistral-7B-Instruct-v0.3",
        "microsoft/Phi-3.5-mini-instruct",
    ],
    messages=[
        {"role": "system", "content": PROMPT},
        {"role": "system", "content": MESSAGE},
    ],
)
```

To use with OpenRouter, set the provider to `openrouter`

```python
# make request to OpenRouter
client = OpenPO(api_key="<your-openrouter-api-key", provider='openrouter')

response = client.completions(
    models = [
        "qwen/qwen-2.5-coder-32b-instruct",
        "mistralai/mistral-7b-instruct-v0.3",
        "microsoft/phi-3.5-mini-128k-instruct",
    ],
    messages=[
        {"role": "system", "content": PROMPT},
        {"role": "system", "content": MESSAGE},
    ],

)
```

OpenPO takes default model parameters as a dictionary. Take a look at the documentation for more detail.

```python
response = client.completions(
    models = [
        "Qwen/Qwen2.5-Coder-32B-Instruct",
        "mistralai/Mistral-7B-Instruct-v0.3",
        "microsoft/Phi-3.5-mini-instruct",
    ],
    messages=[
        {"role": "system", "content": PROMPT},
        {"role": "system", "content": MESSAGE},
    ],
    params={
        "max_tokens": 500,
        "temperature": 1.0,
    }
)

```

### Saving Data
Use out of the box storage class to easily upload and download data.

```python
import os
from openpo.client import OpenPO
from openpo.storage.huggingface import HuggingFaceStorage

storage = HuggingFaceStorage(repo_id="my-dataset-repo", api_key="hf-token")
client = OpenPO(api_key="your-huggingface-token")

preference = {} # preference data needs to be in the format {"prompt": ..., "preferred": ..., "rejected": ...} for finetuning
storage.push_to_hub(data=preference, filename="my-data.json")
```

## Structured Outputs (JSON Mode)
OpenPO supports structured outputs using Pydantic model.

> [!NOTE]
> OpenRouter does not natively support structured outputs. This leads to inconsistent behavior from some models when structured output is used with OpenRouter.
>
> It is recommended to use HuggingFace models for structured output.


```python
from pydantic import BaseModel
from openpo.client import OpenPO

client = OpenPO(api_key="your-huggingface-api-key")

class ResponseModel(BaseModel):
    response: str


res = client.completions(
    models=["Qwen/Qwen2.5-Coder-32B-Instruct"],
    messages=[
        {"role": "system", "content": PROMPT},
        {"role": "system", "content": MESSAGE},
    ],
    params = {
        "response_format": ResponseFormat,
    }
)
```

## Contributing
Contributions are what makes open source amazingly special! Here's how you can help:

### Development Setup
1. Fork and clone the repository
```bash
git clone https://github.com/yourusername/openpo.git
cd openpo
```

2. Install Poetry (dependency management tool)
```bash
curl -sSL https://install.python-poetry.org | python3 -
```

3. Install dependencies
```bash
poetry install
```

### Development Workflow
1. Create a new branch for your feature
```bash
git checkout -b feature-name
```

2. Submit a Pull Request
- Write a clear description of your changes
- Reference any related issues


