Metadata-Version: 2.4
Name: prompteval-core
Version: 0.1.0
Summary: Semantic Testing for LLMs - Test your AI outputs with semantic similarity validation
Home-page: https://github.com/prompteval/prompteval-sdk
Author: PromptEval Team
Author-email: PromptEval Team <hello@getprompteval.com>
License: PromptEval
Project-URL: Homepage, https://getprompteval.com
Project-URL: Documentation, https://getprompteval.com/docs
Project-URL: Repository, https://github.com/prompteval/prompteval-sdk
Project-URL: Issues, https://github.com/prompteval/prompteval-sdk/issues
Keywords: llm,testing,semantic,ai,nlp,validation,prompt,gpt,openai,anthropic
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Testing
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.25.0
Requires-Dist: pyyaml>=5.4.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# 🧪 PromptEval

**Semantic Testing for LLMs** - Test your AI outputs with semantic similarity validation.

[![PyPI version](https://badge.fury.io/py/prompteval.svg)](https://badge.fury.io/py/prompteval)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

## 🚀 Quick Start

### Installation

```bash
pip install prompteval
```

### CLI Usage

```bash
# Run tests
prompteval run adapter.yml --tests tests.yml --api-key $PROMPTEVAL_API_KEY

# Validate YAML files
prompteval validate tests.yml

# Generate HTML report
prompteval report results.json --output report.html

# Check license/quota
prompteval licenses --api-key $PROMPTEVAL_API_KEY
```

### Python SDK

```python
from prompteval import PromptEval

# Initialize client
client = PromptEval(api_key="pe_xxxxx")

# Run tests from files
result = client.run_from_files(
    adapter_path="adapter.yml",
    tests_path="tests.yml"
)

# Check results
print(f"Success rate: {result.success_rate}%")
print(f"Passed: {result.total_passed}/{result.total_tests}")

if not result.success:
    for test in result.failed_tests:
        print(f"❌ {test.id_test}: {test.similarity:.1%} similarity")
```

## 📋 Configuration Files

### Adapter YAML

Define your LLM endpoint configuration:

```yaml
name: my-llm-api
description: My LLM Testing

endpoint:
  url: https://api.example.com/v1/chat
  method: POST
  timeout: 30

request:
  headers:
    Content-Type: application/json
    Authorization: Bearer ${ENV.API_KEY}
  
  template:
    prompt: "{{PROMPT}}"
    max_tokens: 150

response:
  type: json
  path: choices.0.message.content

validation:
  ml_threshold: 0.75
  use_semantic: true
```

### Test Cases YAML

Define your test cases:

```yaml
tests:
  - id: TEST-001
    description: Basic greeting test
    context:
      PROMPT: "Say hello"
    expected: "Hello! How can I help you today?"
    variants:
      - "Hi there! How can I assist you?"
      - "Hello! What can I do for you?"
    threshold: 0.70
    tags:
      - greeting
      - basic

  - id: TEST-002
    description: Math question
    context:
      PROMPT: "What is 2+2?"
    expected: "4"
    threshold: 0.90
    tags:
      - math
```

## 🔧 SDK Reference

### PromptEval Client

```python
from prompteval import PromptEval

client = PromptEval(
    api_key="pe_xxxxx",           # Required
    base_url="https://...",       # Optional (default: production)
    timeout=300                   # Optional (default: 300s)
)
```

### Running Tests

```python
# From files
result = client.run_from_files("adapter.yml", "tests.yml")

# From dictionaries
result = client.run(
    adapter={"name": "test", "endpoint": {...}},
    tests=[{"id": "T1", "expected": "..."}]
)

# From raw YAML
result = client.run_from_yaml(yaml_string)
```

### EvalResult Object

```python
result.success          # bool - True if all tests passed
result.success_rate     # float - Percentage of passed tests
result.total_tests      # int - Total number of tests
result.total_passed     # int - Number of passed tests
result.total_failed     # int - Number of failed tests
result.total_errors     # int - Number of errors
result.duration_ms      # float - Total duration in milliseconds
result.test_results     # List[TestResult] - All test results
result.failed_tests     # List[TestResult] - Only failed tests
result.passed_tests     # List[TestResult] - Only passed tests
```

### TestResult Object

```python
test.id_test            # str - Test identifier
test.description        # str - Test description
test.passed             # bool - Whether test passed
test.expected           # str - Expected result
test.actual             # str - Actual result
test.similarity         # float - Semantic similarity (0-1)
test.threshold          # float - Required threshold
test.duration_ms        # float - Test duration
test.error              # str - Error message if any
```

### Account Methods

```python
# Get licenses
licenses = client.get_licenses()
for lic in licenses:
    print(f"{lic.plan}: {lic.tests_remaining} tests remaining")

# Get usage
usage = client.get_usage(license_id)
print(f"Tests this month: {usage['tests_this_month']}")

# Get API keys
keys = client.get_api_keys(license_id)
```

## 🔑 Environment Variables

```bash
# Set API key (recommended)
export PROMPTEVAL_API_KEY=pe_xxxxx

# Then use without --api-key flag
prompteval run adapter.yml --tests tests.yml
```

## 🎯 CI/CD Integration

### GitHub Actions

```yaml
name: LLM Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      
      - name: Install PromptEval
        run: pip install prompteval
      
      - name: Run Tests
        env:
          PROMPTEVAL_API_KEY: ${{ secrets.PROMPTEVAL_API_KEY }}
        run: |
          prompteval run adapter.yml --tests tests.yml --output results.json
          prompteval report results.json --output report.html
      
      - name: Upload Report
        uses: actions/upload-artifact@v4
        with:
          name: test-report
          path: report.html
```

## 📊 Semantic Validation

PromptEval uses sentence transformers to compute semantic similarity between expected and actual outputs. This allows flexible matching that understands meaning, not just exact text.

**Example:**
- Expected: `"The capital of France is Paris"`
- Actual: `"Paris is the capital city of France"`
- Similarity: **94%** ✅

Threshold configuration:
- `0.90+` - Very strict (nearly exact match)
- `0.75-0.89` - Strict (same meaning, different words)
- `0.60-0.74` - Moderate (similar concept)
- `<0.60` - Loose (related topic)

## 🔗 Links

- **Website:** https://getprompteval.com
- **Documentation:** https://docs.getprompteval.com
- **API Reference:** https://api.getprompteval.com/docs
- **GitHub:** https://github.com/prompteval/prompteval-sdk

## 📄 License

MIT License - see [LICENSE](LICENSE) file for details.
