Metadata-Version: 2.4
Name: llm-compare
Version: 0.1.0
Summary: LLM plugin to compare responses from multiple language models
Project-URL: Homepage, https://github.com/irthomasthomas/llm-compare
Project-URL: Changelog, https://github.com/irthomasthomas/llm-compare/releases
Project-URL: Issues, https://github.com/irthomasthomas/llm-compare/issues
Author-email: Thomas Hughes <irthomasthomas@gmail.com>
License: Apache-2.0
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.7
Requires-Dist: click
Requires-Dist: llm
Requires-Dist: sqlite-utils
Description-Content-Type: text/markdown

# llm-compare

[![PyPI](https://img.shields.io/pypi/v/llm-compare.svg)](https://pypi.org/project/llm-compare/)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/yourusername/llm-compare/blob/main/LICENSE)

LLM plugin to compare responses from multiple language models and evaluate their performance.

## Installation

Install this plugin in the same environment as [LLM](https://llm.datasette.io/):

```bash
llm install llm-compare
```

## Usage

The plugin adds a new `compare` command that allows you to send the same prompt to multiple models and compare their responses:

```bash
llm compare "What are the key considerations for AI safety?"
```

This will:
1. Send the prompt to multiple models in parallel
2. Gather their responses
3. Use an evaluator model to analyze and compare the responses
4. Generate a synthesis of the best elements from each response

### Options

- `-m, --models`: Models to include in comparison (can specify multiple)
- `--evaluator`: Model to use as evaluator (default: gpt-4)
- `--score-threshold`: Minimum score threshold (default: 0.7)
- `--output`: Save full results to a JSON file
- `--stdin/--no-stdin`: Read additional input from stdin (default: True)

### Examples

Basic usage with default models:
```bash
llm compare "Explain quantum computing"
```

Using specific models:
```bash
llm compare "Explain blockchain technology" \
  -m gpt-4 \
  -m claude-3-sonnet \
  -m gemini-pro \
  --evaluator claude-3-opus-20240229
```

Reading from a file:
```bash
cat prompt.txt | llm compare "Analysis request:"
```

Saving full results:
```bash
llm compare "Complex technical question" \
  -m gpt-4 \
  -m claude-3-opus \
  --output results.json
```

### Output Format

The command outputs:
1. Individual responses from each model
2. Detailed comparison and analysis
3. Scores for each response
4. Key points of agreement and disagreement
5. A synthesized best answer

## Development

To set up this plugin locally:

```bash
git clone https://github.com/yourusername/llm-compare
cd llm-compare
python -m venv venv
source venv/bin/activate
pip install -e '.[test]'
```

Run tests:
```bash
pytest
```

## License

This project is licensed under the Apache License 2.0.