Metadata-Version: 2.4
Name: aumos-agent-eval
Version: 0.2.0
Summary: Framework for evaluating AI agents across multiple quality dimensions
Project-URL: Homepage, https://github.com/aumos-ai/agent-eval
Project-URL: Documentation, https://github.com/aumos-ai/agent-eval#readme
Project-URL: Repository, https://github.com/aumos-ai/agent-eval
Project-URL: Issues, https://github.com/aumos-ai/agent-eval/issues
Author: AumOS Contributors
License-Expression: Apache-2.0
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: click>=8.0
Requires-Dist: pydantic>=2.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0
Provides-Extra: agentcore
Requires-Dist: aumos-agentcore-sdk>=0.1.0; extra == 'agentcore'
Provides-Extra: all-frameworks
Requires-Dist: anthropic>=0.30.0; extra == 'all-frameworks'
Requires-Dist: crewai>=0.1.0; extra == 'all-frameworks'
Requires-Dist: langchain-core>=0.1.0; extra == 'all-frameworks'
Requires-Dist: microsoft-agents>=0.1.0; extra == 'all-frameworks'
Requires-Dist: openai-agents>=0.1.0; extra == 'all-frameworks'
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.30.0; extra == 'anthropic'
Provides-Extra: crewai
Requires-Dist: crewai>=0.1.0; extra == 'crewai'
Provides-Extra: deepeval
Requires-Dist: deepeval>=1.0; extra == 'deepeval'
Provides-Extra: dev
Requires-Dist: mypy>=1.8; extra == 'dev'
Requires-Dist: pip-audit; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest-timeout>=2.3; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.3; extra == 'dev'
Provides-Extra: langchain
Requires-Dist: langchain-core>=0.1.0; extra == 'langchain'
Provides-Extra: microsoft
Requires-Dist: microsoft-agents>=0.1.0; extra == 'microsoft'
Provides-Extra: openai-agents
Requires-Dist: openai-agents>=0.1.0; extra == 'openai-agents'
Description-Content-Type: text/markdown

# agent-eval

Framework for evaluating AI agents across multiple quality dimensions

[![CI](https://github.com/aumos-ai/agent-eval/actions/workflows/ci.yaml/badge.svg)](https://github.com/aumos-ai/agent-eval/actions/workflows/ci.yaml)
[![PyPI version](https://img.shields.io/pypi/v/agent-eval.svg)](https://pypi.org/project/agent-eval/)
[![Python versions](https://img.shields.io/pypi/pyversions/agent-eval.svg)](https://pypi.org/project/agent-eval/)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)

Part of the [AumOS](https://github.com/aumos-ai) open-source agent infrastructure portfolio.

---

## Features

- `EvalRunner` orchestrates multi-run evaluation with configurable concurrency, per-case timeouts, retry logic, and fail-fast mode
- Six evaluation dimensions — `ACCURACY`, `LATENCY`, `COST`, `SAFETY`, `FORMAT`, and `CUSTOM` — each producing a normalized `[0.0, 1.0]` score with a pass/fail determination
- `BenchmarkSuite` YAML loader with per-case expected outputs, latency budgets, and cost caps; `SuiteBuilder` for programmatic construction
- LLM-judge evaluator alongside deterministic accuracy, latency, cost, and format evaluators — all implementing the `Evaluator` ABC
- Agent adapters for LangChain, CrewAI, AutoGen, OpenAI Agents, and plain callables so any agent can be wrapped without code changes
- Reporting in JSON, Markdown, HTML, and rich console output with per-dimension aggregate statistics
- Quality gates (`ThresholdGate`, `CompositeGate`) that turn evaluation results into CI pass/fail signals

## Quick Start

Install from PyPI:

```bash
pip install agent-eval
```

Verify the installation:

```bash
agent-eval version
```

Basic usage:

```python
import agent_eval

# See examples/01_quickstart.py for a working example
```

## Documentation

- [Architecture](docs/architecture.md)
- [Contributing](CONTRIBUTING.md)
- [Changelog](CHANGELOG.md)
- [Examples](examples/README.md)

## Enterprise Upgrade

For production deployments requiring SLA-backed support and advanced
integrations, contact the maintainers or see the commercial extensions documentation.

## Contributing

Contributions are welcome. Please read [CONTRIBUTING.md](CONTRIBUTING.md)
before opening a pull request.

## License

Apache 2.0 — see [LICENSE](LICENSE) for full terms.

---

Part of [AumOS](https://github.com/aumos-ai) — open-source agent infrastructure.
