Metadata-Version: 2.2
Name: polyllm
Version: 1.0.5
Summary: A unified interface for querying multiple large language models
Author: Rory Butler
Project-URL: Homepage, https://github.com/RoryMB/polyllm
Project-URL: Bug Tracker, https://github.com/RoryMB/polyllm/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Developers
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: backoff
Requires-Dist: pydantic
Provides-Extra: all
Requires-Dist: llama-cpp-python; extra == "all"
Requires-Dist: mlx-lm; extra == "all"
Requires-Dist: ollama; extra == "all"
Requires-Dist: openai; extra == "all"
Requires-Dist: google-generativeai; extra == "all"
Requires-Dist: anthropic; extra == "all"
Requires-Dist: litellm; extra == "all"

# PolyLLM

PolyLLM is a Python package that provides a unified interface for interacting with multiple Large Language Models (LLMs) through a single, consistent API. It simplifies the process of working with different LLM providers by abstracting away their specific implementation details.

Links:
- [Python Package Index (PyPI)](https://pypi.org/project/polyllm/)
- [GitHub Repository](https://github.com/RoryMB/PolyLLM)

## Features

- Unified interface for multiple LLM providers:
  - Local LLMs ([llama.cpp](https://github.com/ggerganov/llama.cpp), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python))
  - [Ollama](https://ollama.com)
  - OpenAI (GPT models)
  - Google (Gemini models)
  - Anthropic (Claude models)
- Support for different interaction modes:
  - Standard chat completion
  - Multimodal through image input
  - Function calling / tools
  - JSON output
  - Structured output (using Pydantic models)
  - Streaming real-time responses

### Feature Support Across Providers

| Provider | Standard Chat | Image Input | JSON | Structured Output | Tool Usage |
|----------|---------------|-------------|------|-------------------|------------|
| llama.cpp | ✅ | 🔶 | ✅ | ✅ | ✅ |
| MLX       | ✅ | 🟫 | 🚧 | 🚧 | 🚧 |
| Ollama    | ✅ | ✅ | ✅ | ✅ | ✅ |
| Openai    | ✅ | ✅ | ✅ | ✅ | ✅ |
| Google    | ✅ | ✅ | ✅ | ✅ | ✅ |
| Anthropic | ✅ | ✅ | ✅ | ✅ | ✅ |

#### Streaming Output Modes

| Provider | Plain Text | JSON | Structured Output | Tool Usage |
|----------|------------|------|-------------------|------------|
| llama.cpp | ✅ | ✅ | ✅ | 🟫 |
| MLX       | ✅ | 🚧 | 🚧 | 🟫 |
| Ollama    | ✅ | ✅ | ✅ | 🟫 |
| Openai    | ✅ | ✅ | ❌ | 🟫 |
| Google    | ✅ | ✅ | ✅ | 🟫 |
| Anthropic | ✅ | 🟫 | ❌ | 🟫 |

✅: Supported

🔶: Support planned

❌: Not yet supported by the LLM provider

🟫: Support not planned

> [!WARNING]
> 🚧: MLX support for structuring techniques is not part of the official mlx_lm module.
>
> A modified version of [this GBNF package](https://github.com/ambient-labs/GBNF) is included here to support some interim capabilities.
>
> These features are experimental and will be buggy and slow!

## Installation

```bash
pip install polyllm
```

```bash
pip install polyllm[all] # Gets all optional provider dependencies
```

### Requirements

- Python 3.9+
- `backoff`
- `pydantic`
- Optional dependencies for advanced image input:
  - `numpy`
  - `opencv-python`
  - `pillow`
- Optional dependencies based on which LLM providers you want to use:
  - `llama-cpp-python`
  - `mlx-lm`
  - `ollama`
  - `openai`
  - `google-generativeai`
  - `anthropic`
  - `litellm`

## Configuration

Set your API keys as environment variables:

```bash
export OPENAI_API_KEY="your-key-here"
export GOOGLE_API_KEY="your-key-here"
export ANTHROPIC_API_KEY="your-key-here"
```

## Demo

```bash
python -m polyllm.demo \
    --image-path /path/to/image.jpg \
    --llama-python-model /path/to/model.gguf \
    --llama-python-server-port 8000 \
    --ollama-model llama3.2-vision \
    --openai-model gpt-4o \
    --google-model gemini-1.5-flash-latest \
    --anthropic-model claude-3-5-sonnet-latest
```

## Overview

### Model argument

The `model` argument may be provided as one of the following:
- An instance of `llama_cpp.Llama`
    - Helper function `model = polyllm.load_helpers.load_llama("path/to/model.gguf")`
- A 2-tuple containing instances of `mlx.nn.Module` and `TokenizerWrapper`
    - Helper function `model = polyllm.load_helpers.load_mlx("mlx-community/model-name-here", auto_download=True)`
    - Also accepts a path to a local directory for self-managed downloads
- `'llamacpp/MODEL'`, where `MODEL` is either the port or ip:port of a running llama-cpp-python server (`python -m llama_cpp.server --n_gpu_layers -1 --model path/to/model.gguf`)
    - Treated as `f'http://localhost:{MODEL}/v1'` if `MODEL` DOES NOT contain a `:`.
    - Treated as `f'http://{MODEL}/v1'` if `MODEL` DOES contain a `:`.
- `'ollama/MODEL_NAME'`, where `MODEL_NAME` matches the `ollama run MODEL_NAME` command
- `'openai/MODEL_NAME'`
- `'google/MODEL_NAME'`
- `'anthropic/MODEL_NAME'`
- `'litellm/PROVIDER/MODEL_NAME'`
    - LiteLLM will replace the OpenAI, Google, and Anthropic backends in a future update.
    - At that point, you will no longer need to use `'litellm'` at the start of the string.

### Avaliable Functions

```python
def generate(
    model: str|Llama,
    messages: list,
    temperature: float = 0.0,
    json_output: bool = False,
    structured_output_model: BaseModel|None = None,
    stream: bool = False,
) -> str | Generator[str, None, None]:
```

Generate a chat message response as either a string or generator of strings depending on the `stream` argument.

```python
def generate_tools(
    model: str|Llama,
    messages: list,
    temperature: float = 0.0,
    tools: list[Callable] = None,
) -> tuple[str, str, dict]:
```

Ask the model to try to use one of the provided tools.

Responds with:
- Text reponse
- Tool name (Use `get_tool_func` to get the tool object)
- Tool arguments dictionary

```python
def get_tool_func(
    tools: list[Callable],
    tool: str,
) -> Callable:
```

Returns the tool corresponding to the name. Intended for use with the output of `generate_tools`.

```python
def structured_output_model_to_schema(
    structured_output_model: BaseModel,
    indent: int|str|None = None,
) -> str:
```

Creates a JSON schema string from a Pydantic model. Include the string in one of the messages in a `generate(..., structured_output_model)` call to help guide the model on how to respond.

```python
def structured_output_to_object(
    structured_output: str,
    structured_output_model: type[BaseModel],
) -> BaseModel:
```

Parse the output of a `generate(..., structured_output_model)` call into an instance of the Pydantic BaseModel.

## Quick Start Examples

```python
import polyllm
```

Run `python -m polyllm` to see the full list of detected Ollama, OpenAI, Google, and Anthropic models.

### Basic Usage
```python
response = polyllm.generate(
    model="openai/gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hello, how are you?"}],
    temperature=0.2,
)
print(response)

# Prints:
# Hello! I'm just a computer program, so I don't have feelings, but I'm here to help you. How can I assist you today?
```

### Streaming Response
```python
for chunk in polyllm.generate(
    model="openai/gpt-4",
    messages=[{"role": "user", "content": "Tell me a story"}],
    temperature=0.7,
    stream=True,
):
    print(chunk, end='', flush=True)
print()

# Prints (a word or so at a time):
# Once upon a time, ...
```

### Multimodal (Image Input)
```python
messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image", "image": "/path/to/image"},

            # These also work if you have the image as
            # an np.array / PIL Image instead of on disk:
            # {"type": "image", "image": cv2.imread("/path/to/image")},
            # {"type": "image", "image": Image.open("/path/to/image")},
        ],
    },
]

response = polyllm.generate(
    model="ollama/llama3.2-vision",
    messages=messages,
)
print(response)

# Prints:
# This image depicts ...
```

### Tools / Function Calling
```python
def multiply_large_numbers(x: int, y: int) -> int:
    """Multiplies two large numbers."""
    return x * y


tools = [multiply_large_numbers]
response, tool, args = polyllm.generate_tools(
    model="google/gemini-1.5-pro-latest",
    messages=[{"role": "user", "content": "What is 123456 multiplied by 654321?"}],
    tools=tools,
)

tool_func = polyllm.get_tool_func(tools, tool)
if tool_func:
    # print('response:', response)  # Some models (Anthropic) may return both their tool call AND a text response
    tool_result = tool_func(**args)
    print(tool_result)  # 123456 * 654321 = 80779853376
else:
    print(response)

# Prints:
# 80779853376.0
```

### JSON Output
```python
response = polyllm.generate(
    model="anthropic/claude-3-5-sonnet-latest",
    messages=[{"role": "user", "content": "List three colors in JSON"}],
    json_output=True,
)
print(response)

# Prints:
# {
#   "colors": [
#     "red",
#     "blue",
#     "green"
#   ]
# }

import json
print(json.loads(response))

# Prints:
# {'colors': ['red', 'blue', 'green']}
```

### Structured Output
```python
from pydantic import BaseModel, Field

class Flight(BaseModel):
    departure_time: str = Field(description="The time the flight departs")
    destination: str = Field(description="The destination of the flight")

class FlightList(BaseModel):
    flights: list[Flight] = Field(description="A list of known flight details")

flight_list_schema = polyllm.structured_output_model_to_schema(FlightList, indent=2)


response = polyllm.generate(
    model="google/gemini-1.5-pro-latest",
    messages=[
        {
            "role": "user",
            "content": f"Write a list of 2 to 5 random flight details.\nProduce the result in JSON that matches this schema:\n{flight_list_schema}",
        },
    ],
    structured_output_model=FlightList,
)
print(response)

# Prints:
# {"flights": [{"departure_time": "2024-07-20T08:30", "destination": "JFK"}, {"departure_time": "2024-07-21T14:00", "destination": "LAX"}, {"departure_time": "2024-07-22T16:45", "destination": "ORD"}, {"departure_time": "2024-07-23T09:15", "destination": "SFO"}]}

response_object = polyllm.structured_output_to_object(response, FlightList)
print(response_object.flights[0].destination)

# Prints:
# JFK
```

### Langchain

```python
from polyllm.langchain import LCPolyLLM

llm = LCPolyLLM(model="openai/gpt-4")
response = llm.invoke("What is your name?")
print(response)

# Prints:
# As an artificial intelligence, I don't have a personal name. You can simply refer to me as OpenAI.

from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages(
    [("system", "you are a bot"), ("human", "{input}")]
)
chain = prompt | llm
response = chain.invoke(input="What are you?")
print(response)

# Prints:
# Bot: I am an artificial intelligence assistant designed to help answer questions and provide information.
```
