Metadata-Version: 2.4
Name: agentuq
Version: 0.1.0
Summary: Single-pass runtime reliability instrumentation for LLM agents using token logprobs.
Project-URL: Homepage, https://github.com/antoinenguyen27/agentUQ
Project-URL: Documentation, https://github.com/antoinenguyen27/agentUQ/tree/main/docs
Project-URL: Repository, https://github.com/antoinenguyen27/agentUQ
Project-URL: Issues, https://github.com/antoinenguyen27/agentUQ/issues
Author: AgentUQ OSS Contributors
License: MIT
License-File: LICENSE.txt
Keywords: agent-reliability,ai-agents,llm,logprobs,observability,uncertainty
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: pydantic<3,>=2.8
Requires-Dist: typing-extensions<5,>=4.10
Provides-Extra: dev
Requires-Dist: build<2,>=1.2; extra == 'dev'
Requires-Dist: pytest<9,>=8.2; extra == 'dev'
Requires-Dist: twine<7,>=5; extra == 'dev'
Provides-Extra: rich
Requires-Dist: rich<15,>=13.7; extra == 'rich'
Description-Content-Type: text/markdown

# AgentUQ

Single-pass runtime reliability gate for LLM agents using token logprobs.

AgentUQ turns provider-native token logprobs into localized runtime decisions for agent steps. It does not claim to know whether an output is true. It tells you where a generation looked brittle or ambiguous and whether the workflow should continue, annotate the trace, regenerate a risky span, retry the step, dry-run verify, ask for confirmation, or block execution.

## Why teams use it

- Catch brittle action-bearing spans before execution: SQL clauses, tool arguments, selectors, URLs, paths, shell flags, and JSON leaves
- Localize risk to the exact span that matters instead of treating the whole response as one opaque score
- Spend expensive verification selectively by using AgentUQ as the first-pass gate

## Install

```bash
pip install agentuq
```

For the OpenAI example below, also install the provider SDK:

```bash
pip install openai
```

For local development and contributions:

```bash
python -m venv .venv
. .venv/bin/activate
pip install -e .[dev]
```

Examples below assume the public package and import namespace `agentuq`.

## Integration status

OpenAI Responses API is the stable integration path in the current docs. Every other documented provider, gateway, and framework integration is preview, including OpenAI Chat Completions, OpenRouter, LiteLLM, Gemini, Fireworks, Together, LangChain, LangGraph, and the OpenAI Agents SDK.

## Minimal loop

```python
from openai import OpenAI

from agentuq import Analyzer, UQConfig
from agentuq.adapters.openai_responses import OpenAIResponsesAdapter

client = OpenAI()
response = client.responses.create(
    model="gpt-4.1-mini",
    input="Return the single word Paris.",
    include=["message.output_text.logprobs"],
    top_logprobs=5,
    temperature=0.0,
    top_p=1.0,
)

adapter = OpenAIResponsesAdapter()
analyzer = Analyzer(UQConfig(policy="balanced", tolerance="strict"))
record = adapter.capture(
    response,
    {
        "model": "gpt-4.1-mini",
        "include": ["message.output_text.logprobs"],
        "top_logprobs": 5,
        "temperature": 0.0,
        "top_p": 1.0,
    },
)
result = analyzer.analyze_step(
    record,
    adapter.capability_report(
        response,
        {
            "model": "gpt-4.1-mini",
            "include": ["message.output_text.logprobs"],
            "top_logprobs": 5,
            "temperature": 0.0,
            "top_p": 1.0,
        },
    ),
)

print(result.pretty())
```

## Documentation

The web docs are built with Docusaurus from the canonical Markdown in [`docs/`](docs) and the site app in [`website/`](website).

- Start here: [docs/index.mdx](docs/index.mdx)
- Get started: [docs/get-started/index.md](docs/get-started/index.md)
- Provider and framework quickstarts: [docs/quickstarts/index.md](docs/quickstarts/index.md)
- Concepts: [docs/concepts/index.md](docs/concepts/index.md)
- API reference: [docs/concepts/public_api.md](docs/concepts/public_api.md)
- Maintainers: [docs/maintainers/index.md](docs/maintainers/index.md)
- Contributing: [CONTRIBUTING.md](CONTRIBUTING.md)

## Repo layout

- [`src/agentuq`](src/agentuq): library code
- [`examples`](examples): usage examples
- [`tests`](tests): offline, contract, and optional live tests
- [`docs`](docs): canonical documentation content
- [`website`](website): Docusaurus site and Vercel-facing app

## Testing

Default pytest runs only offline tests:

```bash
python -m pytest
```

Live smoke checks are manual and opt-in:

```bash
AGENTUQ_RUN_LIVE=1 python -m pytest -m live
```
