Metadata-Version: 2.4
Name: prompteval-core
Version: 0.1.4
Summary: Semantic Testing for LLMs - Test your AI outputs with semantic similarity validation
Home-page: https://github.com/prompteval/prompteval-sdk
Author: PromptEval Team
Author-email: PromptEval Team <hello@getprompteval.com>
License: PromptEval
Project-URL: Homepage, https://getprompteval.com
Project-URL: Documentation, https://getprompteval.com/docs
Project-URL: Repository, https://github.com/manuelledezma687/prompteval
Project-URL: Issues, https://github.com/manuelledezma687/prompteval/issues
Keywords: llm,testing,semantic,ai,nlp,validation,prompt,gpt,openai,anthropic
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Testing
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: requests>=2.25.0
Requires-Dist: pyyaml>=5.4.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Dynamic: author
Dynamic: home-page
Dynamic: requires-python

# 🧪 PromptEval

**Semantic Testing for LLMs** - Test your AI outputs with semantic similarity validation.

![Version](https://img.shields.io/badge/version-0.1.4-blue)
![Python](https://img.shields.io/badge/python-3.10+-green)
![License](https://img.shields.io/badge/license-PromptEval-yellow)

## 🚀 Quick Start

### Installation

```bash
pip install prompteval-core
```

### CLI Usage

```bash
# Run tests
prompteval run adapter.yml --tests tests.yml --api-key $PROMPTEVAL_API_KEY

# Validate YAML files
prompteval validate tests.yml

# Generate HTML report
prompteval report results.json --output report.html

# Check license/quota
prompteval licenses --api-key $PROMPTEVAL_API_KEY
```

### Python SDK

```python
from prompteval import PromptEval

# Initialize client
client = PromptEval(api_key="pe_xxxxx")

# Run tests from files
result = client.run_from_files(
    adapter_path="adapter.yml",
    tests_path="tests.yml"
)

# Check results
print(f"Success rate: {result.success_rate}%")
print(f"Passed: {result.total_passed}/{result.total_tests}")

if not result.success:
    for test in result.failed_tests:
        print(f"❌ {test.id_test}: {test.similarity:.1%} similarity")
```

## 📋 Configuration Files

### Adapter YAML

Define your LLM endpoint configuration:

```yaml
name: fitness-llm-api
description: Fitness LLM Testing

# Endpoint
endpoint:
  url: https://fitness-llm.com/v1/chat
  method: POST
  timeout: 10

# Request template
request:
  headers:
    Content-Type: application/json
  
  template:
    prompt: "{{PROMPT}}"
    type: "{{TYPE}}"
    max_tokens: 150

# Response extraction
response:
  type: json
  path: choices.0.message.content

# Validation
validation:
  ml_threshold: 0.75
  use_semantic: true

execution:
  parallel_limit: 10
  batch_delay: 0.3
  output_dir: ./output
```

### Test Cases YAML

Define your test cases:

```yaml
tests:
  - name: duration_basic
    id: FIT-001
    description: Pregunta sobre tiempo de entrenamiento
    prompt: "Ask about training duration"
    context:
      PROMPT: "Ask about training duration"
      TYPE: "duration"
    expected: "Desde cuando estas entrenando este ejercicio o rutina"
    variants:
      - "Hace cuanto tiempo empezaste con este entrenamiento"
      - "Cuanto tiempo llevas entrenando este ejercicio"
      - "Por favor, dime cuándo empezaste con este entrenamiento."
      - "Cuéntame, ¿desde cuándo entrenas así?"
    threshold: 0.70
    tags:
      - duration
```

## 🔧 SDK Reference

### PromptEval Client

```python
from prompteval import PromptEval

client = PromptEval(
    api_key="pe_xxxxx",           # Required
    base_url="https://...",       # Optional (default: production)
    timeout=300                   # Optional (default: 300s)
)
```

### Running Tests

```python
# From files
result = client.run_from_files("adapter.yml", "tests.yml")

# From dictionaries
result = client.run(
    adapter={"name": "test", "endpoint": {...}},
    tests=[{"id": "T1", "expected": "..."}]
)

# From raw YAML
result = client.run_from_yaml(yaml_string)
```

### EvalResult Object

```python
result.success          # bool - True if all tests passed
result.success_rate     # float - Percentage of passed tests
result.total_tests      # int - Total number of tests
result.total_passed     # int - Number of passed tests
result.total_failed     # int - Number of failed tests
result.total_errors     # int - Number of errors
result.duration_ms      # float - Total duration in milliseconds
result.test_results     # List[TestResult] - All test results
result.failed_tests     # List[TestResult] - Only failed tests
result.passed_tests     # List[TestResult] - Only passed tests
```

### TestResult Object

```python
test.id_test            # str - Test identifier
test.description        # str - Test description
test.passed             # bool - Whether test passed
test.expected           # str - Expected result
test.actual             # str - Actual result
test.similarity         # float - Semantic similarity (0-1)
test.threshold          # float - Required threshold
test.duration_ms        # float - Test duration
test.error              # str - Error message if any
```

### Account Methods

```python
# Get licenses
licenses = client.get_licenses()
for lic in licenses:
    print(f"{lic.plan}: {lic.tests_remaining} tests remaining")

# Get usage
usage = client.get_usage(license_id)
print(f"Tests this month: {usage['tests_this_month']}")

# Get API keys
keys = client.get_api_keys(license_id)
```

## 🔑 Environment Variables

```bash
# Set API key (recommended)
export PROMPTEVAL_API_KEY=pe_xxxxx

# Then use without --api-key flag
prompteval run adapter.yml --tests tests.yml
```

## 🎯 CI/CD Integration

### GitHub Actions

```yaml
name: LLM Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      
      - name: Install PromptEval
        run: pip install prompteval
      
      - name: Run Tests
        env:
          PROMPTEVAL_API_KEY: ${{ secrets.PROMPTEVAL_API_KEY }}
        run: |
          prompteval run adapter.yml --tests tests.yml --output results.json
          prompteval report results.json --output report.html
      
      - name: Upload Report
        uses: actions/upload-artifact@v4
        with:
          name: test-report
          path: report.html
```

## 📊 Semantic Validation

PromptEval uses sentence transformers to compute semantic similarity between expected and actual outputs. This allows flexible matching that understands meaning, not just exact text.

**Example:**
- Expected: `"The capital of France is Paris"`
- Actual: `"Paris is the capital city of France"`
- Similarity: **94%** ✅

Threshold configuration:
- `0.90+` - Very strict (nearly exact match)
- `0.75-0.89` - Strict (same meaning, different words)
- `0.60-0.74` - Moderate (similar concept)
- `<0.60` - Loose (related topic)

## 🔗 Links

- **Website:** https://getprompteval.com
- **Documentation:** https://docs.getprompteval.com
- **API Reference:** https://api.getprompteval.com/docs
- **GitHub:** https://github.com/prompteval/prompteval-sdk

## 📄 License

Copyright (c) 2026 PromptEval. All Rights Reserved.

This source code is proprietary and confidential. 
Unauthorized copying, distribution, or use is strictly prohibited.

---

<p align="center">
  Made with ❤️ by the PromptEval Team
</p>
