Metadata-Version: 2.4
Name: daef
Version: 0.1.1
Summary: Domain-Aware Evaluation Framework — intelligent LLM evaluation with domain-specific metrics
License: MIT License
        
        Copyright (c) 2025 Pranav
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/pranavbhimte/daef
Project-URL: Issues, https://github.com/pranavbhimte/daef/issues
Keywords: llm,evaluation,ai,nlp,rag,metrics,google-adk
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: google-adk>=1.0.0
Requires-Dist: pydantic>=2.0
Requires-Dist: python-dotenv>=1.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.24; extra == "dev"
Dynamic: license-file

# DAEF — Domain-Aware Evaluation Framework

Intelligent LLM evaluation that adapts to your domain. Instead of generic metrics, DAEF researches your domain, understands industry-specific requirements, and generates tailored evaluation criteria using a multi-agent pipeline powered by Google's Agent Development Kit.

## Installation

```bash
pip install daef
```

## Quick Start

```python
import asyncio
from daef import DAEFClient, EvaluationRequest

client = DAEFClient(api_key="YOUR_GOOGLE_API_KEY")

request = EvaluationRequest(
    domain="Healthcare",
    task_description="Medical Q&A chatbot answering patient questions",
    task_type="single_call",
    focus_areas=["Security and Guardrails", "Content Generation Quality"],
    prompt="What is the recommended dosage of ibuprofen for adults?",
    llm_output="The standard adult dose of ibuprofen is 200-400mg every 4-6 hours...",
)

result = asyncio.run(client.evaluate(request))
print(f"Overall Score: {result.overall_score}/100")
for metric in result.metrics:
    print(f"  {metric.metric_name}: {metric.score}/100")
```

## Features

- **Domain-aware**: Researches domain-specific standards (Healthcare, Legal, Finance, Education, and more)
- **Adaptive metrics**: Selects from 30+ built-in metrics and supports custom metrics
- **Task-type support**: RAG, Fine-tuning, Single LLM Call
- **Version comparison**: Compare two evaluation results to understand regressions and improvements
- **Async-first**: Built on asyncio for high-throughput pipelines

## Configuration

Set your Google API key via environment variable or constructor:

```bash
export GOOGLE_API_KEY="your-key"
```

```python
# Or pass directly
client = DAEFClient(api_key="your-key")
```

## EvaluationRequest Fields

| Field | Type | Required | Description |
|---|---|---|---|
| `domain` | str | Yes | Domain (e.g. "Healthcare", "Finance") |
| `task_description` | str | Yes | What the LLM task does |
| `task_type` | str | Yes | `"rag"`, `"tuning"`, or `"single_call"` |
| `focus_areas` | list[str] | No | Up to 3 priority areas |
| `prompt` | str | No | The input prompt sent to the LLM |
| `llm_output` | str | No | The LLM's response to evaluate |
| `context_data` | str | No | Retrieved context (for RAG) |
| `mandatory_metrics` | list[str] | No | Metrics that must be included |
| `avoided_metrics` | list[str] | No | Metrics to exclude |
| `custom_metrics` | list[str] | No | User-defined metric names |

## Version Comparison

```python
result_v1 = asyncio.run(client.evaluate(request_v1))
result_v2 = asyncio.run(client.evaluate(request_v2))

comparison = asyncio.run(client.compare(result_v1, result_v2))
print(f"Overall change: {comparison.overall_change}")
print(f"Summary: {comparison.summary}")
```

## License

MIT
