Metadata-Version: 2.4
Name: synth-agent-sdk
Version: 1.4.1
Summary: Autonomous agents, engineered. A Python SDK for building production-grade AI agents and multi-agent systems.
License-Expression: MIT
Requires-Python: >=3.10
Requires-Dist: click>=8.0
Requires-Dist: httpx>=0.27
Requires-Dist: prompt-toolkit>=3.0
Requires-Dist: pydantic>=2.0
Requires-Dist: rich>=13.0
Requires-Dist: typing-extensions>=4.0
Provides-Extra: agentcore
Requires-Dist: bedrock-agentcore-starter-toolkit>=0.1.0; extra == 'agentcore'
Requires-Dist: bedrock-agentcore>=0.1.0; extra == 'agentcore'
Requires-Dist: boto3>=1.35; extra == 'agentcore'
Requires-Dist: playwright>=1.40; extra == 'agentcore'
Requires-Dist: pyjwt>=2.8; extra == 'agentcore'
Requires-Dist: requests>=2.31; extra == 'agentcore'
Provides-Extra: all
Requires-Dist: anthropic>=0.39; extra == 'all'
Requires-Dist: boto3>=1.35; extra == 'all'
Requires-Dist: google-genai>=1.0; extra == 'all'
Requires-Dist: mcp>=1.0; extra == 'all'
Requires-Dist: ollama>=0.4; extra == 'all'
Requires-Dist: openai>=1.0; extra == 'all'
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.39; extra == 'anthropic'
Provides-Extra: aws
Requires-Dist: bedrock-agentcore-starter-toolkit>=0.1.0; extra == 'aws'
Requires-Dist: bedrock-agentcore>=0.1.0; extra == 'aws'
Requires-Dist: boto3>=1.35; extra == 'aws'
Requires-Dist: playwright>=1.40; extra == 'aws'
Requires-Dist: pyjwt>=2.8; extra == 'aws'
Requires-Dist: requests>=2.31; extra == 'aws'
Provides-Extra: bedrock
Requires-Dist: boto3>=1.35; extra == 'bedrock'
Provides-Extra: cdk
Requires-Dist: aws-cdk-lib>=2.100.0; extra == 'cdk'
Requires-Dist: constructs>=10.0.0; extra == 'cdk'
Provides-Extra: google
Requires-Dist: google-genai>=1.0; extra == 'google'
Provides-Extra: mcp
Requires-Dist: mcp>=1.0; extra == 'mcp'
Provides-Extra: ollama
Requires-Dist: ollama>=0.4; extra == 'ollama'
Provides-Extra: openai
Requires-Dist: openai>=1.0; extra == 'openai'
Provides-Extra: quickstart
Requires-Dist: anthropic>=0.39; extra == 'quickstart'
Requires-Dist: openai>=1.0; extra == 'quickstart'
Provides-Extra: testing
Provides-Extra: ui
Requires-Dist: fastapi>=0.115; extra == 'ui'
Requires-Dist: uvicorn>=0.30; extra == 'ui'
Description-Content-Type: text/markdown

# Synth

> Autonomous agents, engineered.

**Version:** 1.4.0 — Advanced features release | [PyPI](https://pypi.org/project/synth-agent-sdk/) | [Changelog](CHANGELOG.md)

A Python SDK for building production-grade AI agents and multi-agent systems. From a 3-line single agent to complex, stateful, resumable multi-agent graphs — with model-agnostic provider support, streaming, observability, evaluation, and guardrails out of the box.

Synth 1.0 is the first fully stable, production-ready release. The public API is stable and will follow semantic versioning from this point forward.

---

## Table of Contents

1. [What is Synth?](#what-is-synth)
2. [Installation](#installation)
3. [Quick Start](#quick-start)
4. [Core Concepts](#core-concepts)
5. [Creating an Agent](#creating-an-agent)
6. [Tools](#tools)
7. [Built-in Tools](#built-in-tools)
8. [Agent-as-Tool Composition](#agent-as-tool-composition)
9. [MCP Integration](#mcp-integration)
10. [Tool Middleware](#tool-middleware)
11. [Dependency Injection (RunContext)](#dependency-injection-runcontext)
12. [Running Your Agent](#running-your-agent)
13. [Streaming](#streaming)
14. [Streaming Structured Output](#streaming-structured-output)
15. [Model Providers](#model-providers)
16. [Provider Fallback Chains](#provider-fallback-chains)
17. [Memory](#memory)
18. [Conversation Management](#conversation-management)
19. [Guards](#guards)
20. [Guard Composition](#guard-composition)
21. [Structured Output](#structured-output)
22. [Pipelines](#pipelines)
23. [Graphs](#graphs)
24. [Graph Debugging](#graph-debugging)
25. [Graph Parallel Execution](#graph-parallel-execution)
26. [Human-in-the-Loop](#human-in-the-loop)
27. [Agent Teams](#agent-teams)
28. [Tracing](#tracing)
29. [Trace-to-Eval Pipeline](#trace-to-eval-pipeline)
30. [Checkpointing](#checkpointing)
31. [Evaluation](#evaluation)
32. [Testing Infrastructure](#testing-infrastructure)
33. [CLI Commands](#cli-commands)
34. [Testing Dashboard](#synth-create-ui)
35. [Deploying to AWS AgentCore](#deploying-to-aws-agentcore)
36. [AgentCore Evaluations](#agentcore-evaluations)
37. [Error Handling](#error-handling)
38. [Environment Variables](#environment-variables)
39. [FAQ](#faq)

---

## What is Synth?

Synth is a Python library for building AI-powered agents. An agent uses a large language model (Claude, GPT, Gemini, etc.) to understand instructions, make decisions, and take actions — calling functions, searching databases, generating reports, or coordinating with other agents.

Synth handles the plumbing (provider communication, conversation management, retries, cost tracking) so you focus on what your agent actually does.

---

## Installation

Requires Python 3.10+.

```bash
pip install synth-agent-sdk[anthropic]     # Anthropic Claude (recommended)
```

Other options:

```bash
pip install synth-agent-sdk[quickstart]    # Claude + GPT (tutorials/demos)
pip install synth-agent-sdk[openai]        # OpenAI GPT
pip install synth-agent-sdk[google]        # Google Gemini
pip install synth-agent-sdk[ollama]        # Local Ollama models
pip install synth-agent-sdk[bedrock]       # AWS Bedrock
pip install synth-agent-sdk[agentcore]     # AWS AgentCore deployment
pip install synth-agent-sdk[mcp]           # MCP (Model Context Protocol) integration
pip install synth-agent-sdk[ui]            # Browser testing dashboard
pip install synth-agent-sdk[all]           # All providers + MCP
```

> **Important:** The package name is `synth-agent-sdk`, not `synth`. Running `pip install synth` installs an unrelated C++ template engine that will fail to build. Always use `synth-agent-sdk`.

### Recommended: Install in a Virtual Environment

```bash
# macOS / Linux
python3 -m venv .venv
source .venv/bin/activate

# Windows
python -m venv .venv
.venv\Scripts\activate
```

Then install:

```bash
pip install synth-agent-sdk[anthropic]
```

### macOS Notes

**Apple Silicon (M1/M2/M3/M4):** If you install the `bedrock` or `agentcore` extras, the `botocore[crt]` dependency pulls in `awscrt`, a compiled C extension. If the build fails:

1. Make sure Xcode Command Line Tools are installed:
   ```bash
   xcode-select --install
   ```
2. If using pyenv, ensure your Python was built with the correct architecture:
   ```bash
   python3 -c "import platform; print(platform.machine())"
   # Should print "arm64" on Apple Silicon
   ```
3. If the `awscrt` wheel still fails, install without CRT (slightly slower S3 transfers but fully functional):
   ```bash
   pip install botocore boto3
   pip install synth-agent-sdk[agentcore] --no-deps
   pip install synth-agent-sdk
   ```

**Homebrew Python:** If you use Homebrew's Python, create a venv first — installing packages globally into Homebrew Python is [externally managed](https://peps.python.org/pep-0668/) and will be rejected by pip.

### Global Install with pipx

If you want the `synth` CLI available globally without activating a venv each time, use [pipx](https://pipx.pypa.io/):

```bash
# Install pipx if you don't have it
# macOS
brew install pipx
pipx ensurepath

# Linux / Windows
pip install --user pipx
pipx ensurepath
```

Then install Synth:

```bash
pipx install synth-agent-sdk[anthropic]
```

To add extra providers to an existing pipx install:

```bash
pipx inject synth-agent-sdk anthropic openai       # add provider SDKs
pipx inject synth-agent-sdk boto3 'botocore[crt]'   # add Bedrock/AWS support
```

This gives you the `synth` CLI globally (`synth init`, `synth dev`, `synth doctor`, etc.) while keeping dependencies isolated. For project work that imports `from synth import Agent`, you'll still want a venv with `pip install synth-agent-sdk` so your project can access the library.

Set your API key:

```bash
export ANTHROPIC_API_KEY="your-key-here"   # Claude
export OPENAI_API_KEY="your-key-here"      # GPT
export GOOGLE_API_KEY="your-key-here"      # Gemini
# AWS Bedrock uses standard IAM credentials — no Synth-specific key needed
```

Verify your setup:

```bash
synth doctor
```

---

## Quick Start

The fastest way to get going is `synth init`, which scaffolds a complete project interactively:

```bash
mkdir my-agent && cd my-agent
synth init
```

This walks you through provider selection, model choice, tools, and features — then generates a ready-to-run project:

```
  SYNTH INIT
  Interactive project setup

  Project type (single, multi) [single]:
  Project name [my-agent]:
  Description [An AI agent built with SynthAgentSDK]:

  Available providers:
    anthropic              Anthropic Claude
    openai                 OpenAI GPT
    google                 Google Gemini
    ollama                 Local Ollama
    bedrock                AWS Bedrock
    agentcore              AWS AgentCore

  Provider [anthropic]:
  Model [claude-sonnet-4-5]:
  Agent instructions [You are a helpful assistant.]:

  ...tool wizard, MCP wizard, feature toggles...

  Summary:
    Name:         my-agent
    Provider:     Anthropic Claude
    Model:        claude-sonnet-4-5
    Features:     memory, guards
    Files:        agent.py, README.md, synth.toml

  Create project? [Y/n]:

  How would you like to test?
    ui                     Launch the browser-based testing dashboard
    cli                    Open the interactive CLI shell

  Testing mode [cli]:
```

Once generated, run your agent:

```bash
synth dev agent.py          # Interactive REPL with streaming + trace UI
synth run agent.py "Hello"  # One-shot execution
```

For multi-agent projects, select `multi` at the project type prompt to configure multiple agents with orchestration (Pipeline, Graph, AgentTeam, or Human-in-the-Loop).

Or skip the wizard and write an agent directly:

```python
from synth import Agent

agent = Agent(model="claude-sonnet-4-5", instructions="You are a helpful assistant.")
result = agent.run("What is the capital of France?")
print(result.text)
# => "The capital of France is Paris."
```

---

## Core Concepts

| Concept | What It Is |
|---------|-----------|
| `Agent` | The main building block. Wraps an AI model with tools, memory, and guards. |
| `Tool` | A Python function your agent can call. |
| `ToolKit` | A bundle of related tools. |
| `AgentTool` | Wraps an Agent as a tool for another Agent (hierarchical composition). |
| `MCPClient` | Discovers and registers tools from MCP servers. |
| `BuiltinTool` | Pre-built tools for file I/O, shell, HTTP, and web search. |
| `BaseToolMiddleware` | Hooks that wrap every tool call (caching, logging, rate limiting). |
| `RunContext` | Typed dependency injection container for tools. |
| `RunResult` | Returned by `agent.run()` — text, token usage, cost, latency, trace. |
| `Memory` | Lets your agent remember previous conversations. |
| `ConversationManager` | Automatic context window management (sliding window / summarize). |
| `Guard` | A safety rule applied to input or output. |
| `Pipeline` | Chains agents sequentially. |
| `Graph` | A workflow with branching, loops, parallel execution, and conditional logic. |
| `AgentTeam` | Multiple agents coordinated by an orchestrator. |
| `Trace` | A detailed record of everything that happened during a run. |
| `TraceToEval` | Converts production traces into evaluation datasets. |
| `Checkpoint` | A saved snapshot of a run's state for resumption. |
| `TestModel` | Deterministic mock provider for unit testing agents. |
| `FunctionModel` | Custom test provider driven by a user function. |
| `VCRRecorder` | Records and replays real LLM interactions for integration tests. |
| `NodeExecution` | Debug record of a single graph node execution. |
| `PartialOutputEvent` | Stream event for incrementally validated structured output fields. |

---

## Creating an Agent

```python
from synth import Agent, Guard, Memory

agent = Agent(
    model="claude-sonnet-4-5",        # AI model to use
    instructions="You are helpful.",   # System prompt
    tools=[my_tool, my_toolkit],      # Optional tools
    memory=Memory.thread(),           # Optional memory
    guards=[Guard.no_pii_output()],   # Optional safety rules
    output_schema=MyModel,            # Optional Pydantic schema
    max_retries=3,                    # Retry on transient errors
    retry_backoff=1.0,                # Base delay between retries (seconds)
    deps=my_dependencies,             # Optional dependency injection
    tool_middleware=[CachingMiddleware(ttl_seconds=300)],  # Optional middleware
    fallback=["gpt-4o", "claude-haiku-3-5"],              # Optional fallback chain
    parallel_guards=True,             # Evaluate guards concurrently
)
```

All parameters except `model` are optional. Default model is `claude-sonnet-4-5`.

---

## Tools

Tools are Python functions your agent can call. Mark them with `@tool` — Synth auto-generates JSON schemas from type hints and docstrings.

```python
from synth import tool

@tool
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    return f"The weather in {city} is sunny, 72°F."

agent = Agent(
    model="claude-sonnet-4-5",
    instructions="You are a weather assistant.",
    tools=[get_weather],
)
```

Rules: every parameter needs a type annotation, and the function needs a docstring. Missing either raises `ToolDefinitionError` immediately.

Group related tools with `ToolKit`:

```python
from synth import ToolKit

math_tools = ToolKit([add, multiply, divide])
agent = Agent(model="gpt-4o", tools=[math_tools, get_weather])
```

Inspect tool calls after a run:

```python
for tc in result.tool_calls:
    print(f"{tc.name}({tc.args}) → {tc.result}  [{tc.latency_ms:.1f}ms]")
```

---

## Built-in Tools

Synth ships with commonly needed tools ready to use — file I/O, shell commands, HTTP requests, and web search:

```python
from synth.tools.builtins import BuiltinTool, read_file, write_file, http_request

# Use individual tools
agent = Agent(model="claude-sonnet-4-5", tools=[read_file, write_file])

# Or bundle all built-in tools with safe defaults
agent = Agent(model="claude-sonnet-4-5", tools=[BuiltinTool.all()])
```

Security defaults: shell is disabled, HTTPS is enforced, file paths are validated against traversal attacks.

```python
# Enable shell with explicit opt-in
kit = BuiltinTool.all(allow_shell=True, allowed_dir="/workspace")

# Configure individually
shell_tool = BuiltinTool.shell(allowed=True, timeout=60)
```

| Tool | Description | Default |
|------|-------------|---------|
| `read_file(path)` | Read file contents | Path traversal protected |
| `write_file(path, content)` | Write file, create dirs | Path traversal protected |
| `shell(command)` | Execute shell command | Disabled by default |
| `http_request(url, method, body)` | HTTP request | HTTPS enforced |
| `web_search(query, limit)` | Web search via Brave/SerpAPI/Tavily | Auto-detects API key |

---

## Agent-as-Tool Composition

Use one Agent as a tool for another, enabling hierarchical delegation:

```python
from synth import Agent, AgentTool

researcher = Agent(model="claude-sonnet-4-5", instructions="You research topics thoroughly.")
writer = Agent(model="claude-sonnet-4-5", instructions="You write clear articles.")

# The writer can delegate research to the researcher
editor = Agent(
    model="claude-sonnet-4-5",
    instructions="You coordinate research and writing.",
    tools=[
        AgentTool(researcher, name="research", description="Research a topic"),
        AgentTool(writer, name="write", description="Write an article"),
    ],
)

result = editor.run("Write an article about quantum computing.")
```

The child agent's `RunResult` (cost, tokens, tool calls) is accessible via the parent's trace.

---

## MCP Integration

Connect to Model Context Protocol servers to dynamically discover and use external tools:

```python
from synth import Agent, MCPClient

# HTTP/SSE transport
mcp = MCPClient("https://mcp.example.com/tools")
await mcp.connect()

# Or stdio transport
mcp = MCPClient(["npx", "my-mcp-server"])
await mcp.connect()

# Use discovered tools in an agent
agent = Agent(model="claude-sonnet-4-5", tools=[mcp])
```

MCP tools are validated against their declared JSON schemas before forwarding. Per-tool timeout defaults to 30 seconds.

```bash
pip install synth-agent-sdk[mcp]  # Install the optional MCP dependency
```

---

## Tool Middleware

Wrap every tool invocation with cross-cutting concerns — caching, logging, rate limiting:

```python
from synth import Agent, BaseToolMiddleware
from synth.tools.middleware import CachingMiddleware, LoggingMiddleware

agent = Agent(
    model="claude-sonnet-4-5",
    tools=[my_tool],
    tool_middleware=[
        LoggingMiddleware(level="INFO"),      # Log tool calls
        CachingMiddleware(ttl_seconds=300),   # Cache results for 5 min
    ],
)
```

Middleware executes in declaration order (first in list wraps outermost). Write custom middleware:

```python
class RateLimitMiddleware(BaseToolMiddleware):
    async def call(self, name, args, next_fn):
        await self.check_rate_limit()
        result = await next_fn(name, args)
        return result
```

---

## Dependency Injection (RunContext)

Pass database connections, HTTP clients, or config objects into tools without globals:

```python
from dataclasses import dataclass
from synth import Agent, RunContext, tool

@dataclass
class Deps:
    db_url: str
    api_client: object

@tool
def lookup_user(user_id: str, ctx: RunContext[Deps]) -> str:
    """Look up a user by ID."""
    db = ctx.deps.db_url  # Access injected dependencies
    return f"User {user_id} found at {db}"

agent = Agent(
    model="claude-sonnet-4-5",
    tools=[lookup_user],
    deps=Deps(db_url="postgres://...", api_client=my_client),
)
```

`RunContext` also carries `run_id`, `thread_id`, and `retry_count` metadata. Tools that don't declare a `RunContext` parameter work unchanged.

---

## Running Your Agent

**Synchronous:**

```python
result = agent.run("Explain quantum computing in simple terms.")
print(result.text)        # Response text
print(result.tokens)      # TokenUsage(input, output, total)
print(result.cost)        # Estimated cost in USD
print(result.latency_ms)  # Latency in milliseconds
print(result.tool_calls)  # Tools that were called
print(result.trace)       # Full execution trace
print(result.output)      # Parsed structured output (if output_schema set)
```

**Asynchronous:**

```python
import asyncio

async def main():
    result = await agent.arun("What is 2 + 2?")
    print(result.text)

asyncio.run(main())
```

---

## Streaming

```python
from synth import TokenEvent, ToolCallEvent, ToolResultEvent, DoneEvent, ErrorEvent

for event in agent.stream("Write a short poem about coding."):
    if isinstance(event, TokenEvent):
        print(event.text, end="", flush=True)
    elif isinstance(event, ToolCallEvent):
        print(f"\n[Calling: {event.name}]")
    elif isinstance(event, DoneEvent):
        print(f"\n\nTokens: {event.result.tokens.total_tokens}")
```

Async streaming:

```python
async for event in agent.astream("Write a haiku."):
    if isinstance(event, TokenEvent):
        print(event.text, end="", flush=True)
```

| Event | When |
|-------|------|
| `TokenEvent` | Model produced a text token |
| `ToolCallEvent` | Model decided to call a tool |
| `ToolResultEvent` | Tool finished executing |
| `ThinkingEvent` | Model produced a reasoning token |
| `DoneEvent` | Stream completed — contains full `RunResult` |
| `ErrorEvent` | Something went wrong |

---

## Streaming Structured Output

When using `output_schema` with streaming, Synth emits `PartialOutputEvent` as individual fields are validated:

```python
from synth import Agent, PartialOutputEvent, TokenEvent, DoneEvent
from pydantic import BaseModel

class Analysis(BaseModel):
    sentiment: str
    confidence: float
    summary: str

agent = Agent(model="claude-sonnet-4-5", output_schema=Analysis)

async for event in agent.astream("Analyze this review: Great product!"):
    if isinstance(event, TokenEvent):
        print(event.text, end="")
    elif isinstance(event, PartialOutputEvent):
        print(f"\n  ✓ {event.field_name}: {event.field_value}")
    elif isinstance(event, DoneEvent):
        analysis = event.result.output  # Fully validated Analysis instance
        print(f"\nSentiment: {analysis.sentiment}")
```

The final `DoneEvent.result.output` always contains the fully validated Pydantic model, identical to the non-streaming path. If validation fails, the same retry logic applies.

---

## Model Providers

Switch providers by changing the `model` string — no other code changes needed.

| Provider | Model String Examples | Extra | API Key |
|----------|----------------------|-------|---------|
| Anthropic | `"claude-sonnet-4-5"`, `"claude-haiku-3-5"` | `synth[anthropic]` | `ANTHROPIC_API_KEY` |
| OpenAI | `"gpt-4o"`, `"gpt-4o-mini"` | `synth[openai]` | `OPENAI_API_KEY` |
| Google | `"gemini-2.0-flash"` | `synth[google]` | `GOOGLE_API_KEY` |
| Ollama | `"ollama/llama3"`, `"ollama/mistral"` | `synth[ollama]` | None (local) |
| AWS Bedrock | `"bedrock/claude-sonnet-4-5"` | `synth[bedrock]` | AWS IAM |

Custom endpoint:

```python
agent = Agent(model="my-model", base_url="https://my-proxy.example.com/v1")
```

---

## Provider Fallback Chains

Automatically try alternative models when the primary fails:

```python
agent = Agent(
    model="claude-sonnet-4-5",
    fallback=["gpt-4o", "claude-haiku-3-5"],
    max_retries=3,
)
```

When the primary model fails after all retries, Synth iterates through the fallback list. Each fallback gets its own full retry cycle. Fallback transitions are recorded in the trace as `"fallback"` spans.

```python
result = agent.run("Hello")
for span in result.trace.spans:
    if span.type == "fallback":
        print(f"Fell back from {span.metadata['failed_model']} → {span.metadata['next_model']}")
```

Fallback works with both `run()`/`arun()` and `stream()`/`astream()`.

---

## Memory

By default each `run()` is stateless. Add memory to persist conversations.

**Thread memory** (in-process, fast):

```python
agent = Agent(model="claude-sonnet-4-5", memory=Memory.thread())

agent.run("My name is Alice.", thread_id="user-123")
result = agent.run("What's my name?", thread_id="user-123")
print(result.text)  # "Your name is Alice."
```

**Persistent memory** (Redis, survives restarts):

```python
agent = Agent(model="gpt-4o", memory=Memory.persistent("redis://localhost:6379"))
```

**Semantic memory** (vector embeddings, retrieves most relevant context):

```python
agent = Agent(model="gemini-2.0-flash", memory=Memory.semantic(embedder=my_embedder_fn))
```

---

## Conversation Management

Automatically manage context window size on long-running conversations:

```python
# Sliding window — keep the most recent 50 messages
agent = Agent(
    model="claude-sonnet-4-5",
    memory=Memory.managed(strategy="sliding_window", max_messages=50),
)

# Summarize — compress older messages when token count exceeds threshold
agent = Agent(
    model="claude-sonnet-4-5",
    memory=Memory.managed(
        strategy="summarize",
        model="claude-haiku-3-5",  # Lightweight model for summarization
        max_tokens=80_000,
    ),
)
```

`ConversationManager` wraps any memory backend transparently. Summaries are inserted as system-level context messages (not fabricated user messages) to prevent prompt injection.

---

## Guards

Declarative safety rules applied automatically to every run.

```python
from synth import Guard

agent = Agent(
    model="claude-sonnet-4-5",
    guards=[
        Guard.no_pii_output(),             # Block PII in responses
        Guard.max_cost(dollars=0.50),       # Stop if cost exceeds $0.50
        Guard.no_tool_calls(["delete_*"]), # Block tools matching glob
        Guard.custom(my_check_fn),          # Your own check function
    ],
)
```

Guards run in order. First failure stops execution and raises `GuardViolationError`.

---

## Guard Composition

Combine guards with logical operators and add rate limiting:

```python
from synth import Guard

agent = Agent(
    model="claude-sonnet-4-5",
    guards=[
        Guard.all(                          # All must pass (AND)
            Guard.no_pii_output(),
            Guard.max_cost(dollars=1.00),
        ),
        Guard.any(                          # At least one must pass (OR)
            Guard.custom(check_allowlist),
            Guard.custom(check_admin),
        ),
        Guard.rate_limit(calls_per_minute=30),  # Sliding window rate limit
    ],
    parallel_guards=True,  # Evaluate independent guards concurrently
)
```

`Guard.all()` short-circuits on first failure. `Guard.any()` short-circuits on first success. When `parallel_guards=True`, top-level guards run via `asyncio.gather()` for reduced latency.

---

## Structured Output

Get typed Pydantic objects back instead of raw text:

```python
from pydantic import BaseModel

class MovieReview(BaseModel):
    title: str
    rating: float
    summary: str
    recommended: bool

agent = Agent(
    model="claude-sonnet-4-5",
    instructions="You are a movie critic.",
    output_schema=MovieReview,
)

result = agent.run("Review the movie Inception.")
review = result.output  # MovieReview instance

print(review.title)        # "Inception"
print(review.rating)       # 9.2
print(review.recommended)  # True
```

If parsing fails, Synth retries with a corrective prompt up to `max_retries` times.

---

## Pipelines

Chain agents sequentially — output of each becomes input of the next:

```python
from synth import Pipeline

researcher = Agent(model="claude-sonnet-4-5", instructions="You research topics.")
writer = Agent(model="claude-sonnet-4-5", instructions="You write clear articles.")
editor = Agent(model="claude-sonnet-4-5", instructions="You edit for clarity.")

pipeline = Pipeline([researcher, writer, editor])
result = pipeline.run("The history of the internet")
```

Run stages in parallel with `ParallelGroup`:

```python
from synth.orchestration.pipeline import ParallelGroup

pipeline = Pipeline([
    writer,
    ParallelGroup([fact_checker, style_checker]),  # Run concurrently
    editor,
])
```

Stream with stage labels:

```python
for stage_event in pipeline.stream("Write about AI"):
    print(f"[{stage_event.stage_name}] {stage_event.event}")
```

---

## Graphs

Directed-graph workflows with branching, loops, and conditional logic:

```python
from synth import Graph, node

graph = Graph()

@node(graph)
def classify(state):
    state["priority"] = "high" if "urgent" in state["text"].lower() else "low"
    return state

@node(graph)
def handle_urgent(state):
    state["response"] = "Escalating immediately."
    return state

@node(graph)
def handle_normal(state):
    state["response"] = "We'll respond within 24 hours."
    return state

graph.set_entry("classify")
graph.add_edge("classify", "handle_urgent", when=lambda s: s["priority"] == "high")
graph.add_edge("classify", "handle_normal", when=lambda s: s["priority"] == "low")
graph.add_edge("handle_urgent", Graph.END)
graph.add_edge("handle_normal", Graph.END)

result = graph.run({"text": "This is urgent! Server is down!"})
print(result.output["response"])
```

Loops are supported. Synth enforces `max_iterations=100` by default to prevent infinite loops.

Visualize your graph:

```python
print(graph.visualise())  # Outputs a Mermaid diagram
```

---

## Graph Debugging

Inspect state transitions and trace node execution:

```python
result = await graph.arun({"text": "help!"}, debug=True)

# Execution history with input/output state, latency, timestamps
for node_exec in graph.history():
    print(f"{node_exec.node_name}: {node_exec.latency_ms:.1f}ms")
    print(f"  In:  {node_exec.input_state}")
    print(f"  Out: {node_exec.output_state}")
```

With `debug=True`, the graph emits detailed `DEBUG`-level log messages for node entry/exit, edge evaluation, checkpoint saves, and pause events. The `visualise()` method styles the entry node with a double border and deduplicates END nodes.

---

## Graph Parallel Execution

When multiple unconditional edges fan out from a single node, Synth automatically executes the target nodes concurrently:

```python
graph = Graph()

@node(graph)
def start(state):
    return state

@node(graph)
def fetch_prices(state):
    state["prices"] = get_prices()
    return state

@node(graph)
def fetch_reviews(state):
    state["reviews"] = get_reviews()
    return state

@node(graph)
def merge(state):
    return state

# Fan-out: both fetch nodes run concurrently
graph.add_edge("start", "fetch_prices")
graph.add_edge("start", "fetch_reviews")
graph.add_edge("fetch_prices", "merge")
graph.add_edge("fetch_reviews", "merge")
graph.add_edge("merge", Graph.END)
graph.set_entry("start")
```

Each concurrent node receives a deep-copied state — mutations in one node don't affect others. Results are merged with a shallow dictionary merge by default, or a custom merge function:

```python
graph.with_parallel(merge_fn=lambda states: {k: v for s in states for k, v in s.items()})
```

If any concurrent node raises, all others are cancelled and the error propagates as `GraphRoutingError`.

---

## Human-in-the-Loop

Pause a graph at specific nodes for human review before continuing:

```python
graph.with_human_in_the_loop(pause_at=["draft_email"], timeout=3600)
graph.with_checkpointing()

result = graph.run({"customer": "Alice"}, run_id="email-001")
# result is a PausedRun — inspect result.state["draft"] here

final = graph.resume("email-001", human_input="Looks good, send it.")
```

---

## Agent Teams

Coordinate multiple specialized agents under an orchestrator:

```python
from synth import AgentTeam

team = AgentTeam(
    orchestrator="claude-sonnet-4-5",
    agents=[researcher, writer, analyst],
    strategy="auto",   # orchestrator decides who does what
)

result = team.run("Write a report on renewable energy trends.")
print(result.answer)
print(result.contributions)   # Each agent's individual contribution
print(result.total_cost)
```

Use `strategy="parallel"` to run all agents concurrently.

---

## Tracing

Every run automatically records a detailed trace:

```python
result = agent.run("Summarize this document.")
trace = result.trace

print(f"Tokens: {trace.total_tokens}")
print(f"Cost: ${trace.total_cost:.4f}")
print(f"Latency: {trace.total_latency_ms:.1f}ms")

result.trace.show()                    # Open visual timeline in browser
path = result.trace.export()           # Export as OpenTelemetry JSON
```

Auto-forward all traces to an OTel collector:

```bash
export SYNTH_TRACE_ENDPOINT="https://my-otel-collector.example.com/v1/traces"
```

---

## Trace-to-Eval Pipeline

Convert production traces into evaluation datasets for continuous quality improvement:

```python
from synth.eval import TraceToEval

# Collect traces from production runs
traces = [result1.trace, result2.trace, result3.trace]

# Filter and convert to eval cases
pipeline = (
    TraceToEval(traces)
    .filter(min_latency_ms=100, has_tool_calls=True)
    .filter(custom=lambda t: t.total_tokens > 50)
)

# Create an Eval pre-populated with cases from traces
evaluation = pipeline.to_eval(agent=my_agent)
report = evaluation.run()

# Or export as a JSON dataset for sharing
pipeline.export("eval_dataset.json")
```

The `labeler` parameter overrides expected values (default is the actual output for regression testing):

```python
evaluation = pipeline.to_eval(
    agent=my_agent,
    labeler=lambda prompt, output: "expected_value",
)

---

## Checkpointing

Save and resume graph execution state:

```python
graph.with_checkpointing()
result = graph.run(initial_state, run_id="my-run-001")

# Later, even in a different process
result = graph.resume("my-run-001")
```

Redis backend for distributed systems:

```python
from synth.checkpointing.redis import RedisCheckpointStore

graph.with_checkpointing(store=RedisCheckpointStore("redis://localhost:6379"))
```

---

## Evaluation

Run structured tests against your agent:

```python
from synth import Eval

evaluation = Eval(agent=agent)
evaluation.add_case(input="Capital of France?", expected="Paris")
evaluation.add_case(input="Capital of Japan?", expected="Tokyo")

report = evaluation.run()
print(f"Score: {report.overall_score}")

for case in report.cases:
    status = "PASS" if case.passed else "FAIL"
    print(f"  [{status}] {case.input} → {case.actual}")
```

Custom checker:

```python
def contains_keyword(output: str, expected: str) -> float:
    return 1.0 if expected.lower() in output.lower() else 0.0

evaluation.add_case(input="Explain photosynthesis.", expected="chlorophyll", checker=contains_keyword)
```

---

## Testing Infrastructure

Synth provides three testing tools at different abstraction levels — no API keys needed.

**TestModel** — deterministic canned responses for fast unit tests:

```python
from synth.testing import TestModel

agent = Agent(model=TestModel(responses=["Hello!", "Goodbye!"]))
result = agent.run("Hi")       # Returns "Hello!"
result = agent.run("Bye")      # Returns "Goodbye!"
result = agent.run("Again")    # Cycles back to "Hello!"
```

**FunctionModel** — custom test logic with full message access:

```python
from synth.testing import FunctionModel

def my_logic(messages):
    if "weather" in messages[-1]["content"]:
        return "Sunny, 72°F"
    return "I don't know"

agent = Agent(model=FunctionModel(fn=my_logic))
```

**VCRRecorder** — record real LLM interactions and replay them deterministically:

```python
from synth.testing import VCRRecorder

# Record mode — makes real API calls, saves to file
with VCRRecorder("tests/cassettes/greeting.json", record=True):
    result = agent.run("Hello")

# Replay mode — no network calls, deterministic
with VCRRecorder("tests/cassettes/greeting.json"):
    result = agent.run("Hello")  # Returns recorded response
```

All three are importable from `synth.testing`. `TestModel` is also available via the `"test"` model string prefix.

---

## CLI Commands

Run `synth` with no arguments to launch the interactive shell:

```bash
synth
```

```
synth> run agent.py "Hello"
synth> create agent my-bot
synth> doctor
synth> exit
```

All commands also work directly:

```bash
synth init                                  # Interactive project setup wizard
synth create agent my-bot                   # Scaffold an agent project
synth create agent my-bot -p openai         # Skip prompt, use OpenAI
synth create agentcore my-service           # AWS AgentCore project
synth create team my-team                   # Multi-agent team + pipeline
synth create tool my-tools                  # Standalone tools file
synth create mcp my-server                  # MCP server with FastMCP
synth create ui my-ui                       # Local browser testing dashboard
synth dev my_agent.py                       # Rich terminal UI with hot-reload
synth run my_agent.py "prompt"              # Execute agent, print result
synth bench my_agent.py "prompt" --runs 20  # Benchmark latency/cost
synth eval my_agent.py --dataset cases.json # Run evaluation suite
synth trace <run_id>                        # Open trace in browser
synth deploy --target agentcore             # Deploy to AWS AgentCore
synth deploy --target agentcore --dry-run   # Validate without deploying
synth ui my_agent.py                        # Launch browser testing UI
synth edit agent agent.py                   # Modify existing agent config
synth doctor                                # Check env, credentials, deps
synth info --extra anthropic                # Show package info
synth help                                  # Quick reference card
```

### `synth init`

The fastest way to start a new project. Walks you through:

1. **Project type** — single agent or multi-agent
2. Project name and description
3. Provider selection (anthropic, openai, google, ollama, bedrock, agentcore)
4. Model selection (region-aware for AgentCore with Bedrock model catalog)
5. Agent instructions
6. **Tool Wizard** — pick pre-built tools or scaffold custom `@tool` stubs
7. **MCP Wizard** — pick pre-built MCP servers or scaffold custom `@mcp.tool()` stubs
8. Feature toggles (memory, guards, structured output, eval, deploy)
9. Credential check (AgentCore only)
10. Summary and confirmation
11. Project generation
12. Optional "Deploy now?" prompt (AgentCore only)
13. **Testing mode** — launch the browser UI dashboard or the interactive CLI

#### Multi-Agent Projects

When you select `multi` at the project type prompt, the wizard guides you through:

- **Shared configuration** — after naming the project, you're asked whether to use the same provider/model and tools for all agents. If yes, these are collected once upfront and applied to every agent, dramatically reducing setup time for teams where all agents share infrastructure
- **Agent count** (minimum 2) with per-agent configuration (name, description, instructions — plus provider/model/tools if not shared)
- **Agent name sanitization** — names like "Molly Mikes" or "Cash Carter" are automatically converted to valid Python identifiers (`molly_mikes`, `cash_carter`) for filenames and code, with the original name preserved in docstrings and display
- **Orchestration pattern selection** with descriptions:
  - **Pipeline** — linear sequential chaining, each agent receives the previous agent's output
  - **Graph** — directed graph with conditional edges, branching, and loops
  - **AgentTeam** — orchestrator routes tasks to specialized agents (auto or parallel strategy)
  - **Human-in-the-Loop** — graph with pause/resume checkpoints for human review
- Pattern-specific configuration (execution order, edges, strategy, pause nodes, etc.)
- Feature selection, summary, and project generation

Generated multi-agent project structure:

```
my-project/
├── agent_molly_mikes.py   # Individual agent files (sanitized names)
├── agent_rex_routes.py
├── main.py                # Orchestration wiring (Pipeline/Graph/Team/HITL)
├── tools_molly_mikes.py   # Per-agent tool files (if configured)
├── README.md
├── synth.toml
└── ui/                    # Testing dashboard (if UI mode selected)
    ├── server.py
    └── static/
```

#### Single-Agent Projects

Generated project structure:

```
my-agent/
├── agent.py           # Your agent with selected provider, tools, and features
├── README.md          # Project-specific docs with run instructions
├── synth.toml         # Project configuration
├── tools.py           # Custom tool stubs (if tools selected)
├── mcp_server.py      # MCP server stubs (if MCP selected)
├── eval_dataset.json  # Evaluation cases (if eval selected)
├── eval_config.json   # AgentCore Evaluations config (AgentCore + eval only)
├── agentcore.yaml     # AWS config (AgentCore projects only)
└── .env.template      # Environment variable template (AgentCore only)
```

The testing UI (if selected) is scaffolded at the workspace root, shared across all agents:

```
workspace/
├── my-agent/          # Agent project
├── another-agent/     # Another agent project
└── ui/                # Shared testing dashboard
    ├── server.py
    └── static/
```

For AgentCore projects, `synth init` also:
- Auto-detects AWS credentials (env vars → `~/.aws/credentials` → AWS Toolkit profiles)
- Prompts for target AWS region (default: `us-east-1`)
- Shows Bedrock models available in that region
- Writes `aws_region`, `model_id`, `cris_enabled`, and `aws_profile` to `agentcore.yaml`

Common patterns:

```bash
synth init                          # Full interactive wizard
synth init && cd my-agent && synth dev agent.py   # Init + start developing
```

### `synth dev`

Rich terminal UI for interactive development:

```bash
synth dev my_agent.py
```

When run without a file argument, `synth dev` scans the workspace for agent files and presents an interactive picker. For agents with an `agentcore.yaml`, it checks live deployment status against the AWS account and shows color-coded badges (active, creating, failed). If the selected agent isn't deployed yet, you'll be prompted to deploy before opening the REPL.

Features: streaming token-by-token output, tool call visualization, slash commands (`/tools`, `/reload`, `/trace`, `/export`, `/clear`, `/cost`, `/quit`), markdown rendering, status bar with live cost/token tracking.

### `synth ui`

Launch the browser-based testing UI for any agent file:

```bash
synth ui my_agent.py
```

When run without a file argument, `synth ui` uses the same agent discovery logic as `synth dev`. The command launches the UI server as a subprocess using the SDK's own Python interpreter, so it works correctly even when installed via pipx. The agent file path is passed via the `SYNTH_AGENT_FILE` environment variable.

### `synth create ui`

Scaffold a full-featured browser-based testing dashboard:

```bash
synth create ui my-dashboard
cd my-dashboard
pip install uvicorn fastapi
python server.py
# Open http://localhost:8420
```

The dashboard includes:

- **Streaming chat** with SSE, thinking block support, and markdown rendering
- **Real-time flow visualization** — the Flow tab renders a live node graph as the agent executes, showing the full path from prompt → agent → tool calls → output. Each node is clickable to inspect trace data, arguments, results, token usage, and cost in a slide-in detail panel. Supports multi-agent delegation chains for Team, Pipeline, and Graph orchestration
- **Multi-agent collaboration view** — for AgentTeam, Pipeline, and Graph projects, the UI shows real-time delegation cards as each agent runs, with tool calls, output previews, latency, and cost per agent. A swimlane panel on the final response shows all agent contributions at a glance. The server auto-detects `team`, `pipeline`, or `graph` exports from your `main.py`
- **Conversation management** with persistence, multiple threads, and export
- **Telemetry panel** with per-response and session-level tokens, cost, latency, and cost-per-turn sparkline
- **Tool playground** to test individual tools with custom arguments
- **Prompt library** with versioning, notes, and variable injection (`{{variable}}` syntax)
- **A/B testing** to compare two prompt variants side-by-side with diff view
- **Eval runner** with keyword scoring, LLM judge, golden baselines, and regression detection
- **Session replay** with timeline view, token usage heatmap, and anomaly detection (slow, expensive, or short responses)
- **Scenario builder** for scripted multi-turn conversations
- **AgentCore Evaluations** panel showing evaluator scores, config status, and on-demand evaluation (when configured)
- **Hot-reload** to pick up agent changes without restarting the server

The UI is also scaffolded automatically when you choose `ui` as the testing mode during `synth init`. The UI is created once at the workspace root and shared across all agents — subsequent `synth init` runs detect the existing UI and reuse it. If the server is already running, you'll just see the URL. UI dependencies (`uvicorn`, `fastapi`) are auto-installed if missing.

### `synth deploy`

Guided deployment wizard:

```bash
synth deploy --target agentcore my_agent.py
synth deploy --target agentcore --dry-run my_agent.py  # Stages 1–4 only
```

Stages: credential validation → dependency check → file validation → manifest generation → artifact packaging → deployment readiness → AgentCore API submission. Each prints `[  OK  ]` or `[FAIL]` with a corrective suggestion on failure.

The readiness stage reports on auth method, memory backend, guards, tools, search API keys, and target region/model — with warnings for any missing components.

### `synth edit agent`

Interactively modify an existing agent without editing files manually:

```bash
synth edit agent agent.py
```

Menu options: (a) instructions, (b) model, (c) tools, (d) MCP servers. Shows a diff before writing. Uses atomic temp-file rename to prevent corruption.

### `synth doctor`

```bash
synth doctor
```

Checks: Python version, core dependencies, provider API keys, `SYNTH_TRACE_ENDPOINT` format, optional provider packages, and (when `agentcore.yaml` is present) AgentCore config fields (`aws_region`, `model_id`, `cris_enabled`, `aws_profile`).

### `synth bench`

```bash
synth bench my_agent.py "Hello" --runs 20 --warmup 2
```

Reports p50/p95/p99 latency, average tokens, cost per run, and success rate.

---

## Deploying to AWS AgentCore

### Prerequisites

Install the AgentCore extra:

```bash
pip install synth-agent-sdk[agentcore]
```

You also need working AWS credentials on your machine. Set them up using one of these methods:

**Option A — AWS CLI (recommended for most users):**

```bash
# Install the AWS CLI
# macOS
brew install awscli

# Windows
winget install Amazon.AWSCLI

# Linux
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip && sudo ./aws/install

# Then configure your credentials
aws configure
# Enter your Access Key ID, Secret Access Key, default region, and output format
```

**Option B — AWS IAM Identity Center (SSO):**

```bash
aws configure sso
# Follow the prompts to set up SSO with your organization's identity provider
aws sso login --profile your-profile
```

**Option C — AWS Toolkit for VS Code / JetBrains:**

If you use an IDE with the [AWS Toolkit](https://aws.amazon.com/developer/tools/#IDE_and_IDE_Toolkits) extension, it manages credentials through its own auth flow (Builder ID or IAM Identity Center). Synth picks up these credentials automatically via the shared AWS credential chain.

**Verify your credentials:**

```bash
aws sts get-caller-identity
# Should print your account ID, user ARN, and user ID

synth doctor
# Checks AWS credentials and AgentCore config
```

> For AgentCore deployments, your IAM role needs permissions for Bedrock model invocation and AgentCore API access. Check with your AWS administrator if `synth deploy` fails with access denied errors.

### Wrapping Your Agent

```python
from synth import Agent
from synth.deploy.agentcore import agentcore_handler

agent = Agent(
    model="bedrock/claude-sonnet-4-5",
    instructions="You are a customer support agent.",
    tools=[lookup_order, check_inventory],
)

app = agentcore_handler(agent)
```

### Deploy

```bash
synth deploy --target agentcore --dry-run   # Validate first
synth deploy --target agentcore             # Deploy
```

The packager automatically excludes `.env` files, credential files, and `.synth/checkpoints/` from the artifact. It also scans `agentcore.yaml` for accidental credential patterns and aborts if any are found.

### Environment Variables in the Container

`synth deploy` reads the `environment:` section of `agentcore.yaml` and passes each entry to the container via `agentcore launch --env KEY=VALUE`. This is the right place for **non-sensitive** config like feature flags or log levels.

```yaml
# agentcore.yaml
environment:
  SYNTH_NO_BANNER: "1"
  LOG_LEVEL: "INFO"
```

**API keys and secrets must not go in `agentcore.yaml`.** The deploy wizard filters out any key whose name contains `key`, `secret`, `token`, `password`, or similar patterns — they are never passed via `--env` to avoid exposure in process listings.

Instead, store secrets in AWS Secrets Manager or SSM Parameter Store and fetch them at agent startup:

```python
from synth.deploy.agentcore import get_ssm_parameter

# In your agent file — fetched at runtime inside the container
TAVILY_API_KEY = get_ssm_parameter("/myapp/prod/TAVILY_API_KEY", decrypt=True)

agent = Agent(
    model="bedrock/claude-sonnet-4-5",
    tools=[web_search],
)
```

The readiness stage (`synth deploy`) will warn you if a search API key is found only in your local `.env` and remind you to move it to Secrets Manager before the container can use it.

### Secure User Identity

```python
from synth.deploy.agentcore import extract_user_id

user_id = extract_user_id(context)  # Extracts from signed JWT in RequestContext
```

### Gateway MCP Client

```python
from synth.deploy.agentcore import create_gateway_client

client = create_gateway_client(
    gateway_url="https://my-gateway.example.com",
    client_id_param="/myapp/gateway/client_id",
    client_secret_param="/myapp/gateway/client_secret",
)
mcp_client = client.as_mcp_client()
```

### Code Interpreter

```python
from synth.deploy.agentcore import CodeInterpreterTools

ci = CodeInterpreterTools()
result = ci.execute_python("import math; print(math.sqrt(144))")
print(result)  # "12.0"
```

### Browser Tool

Search the web and navigate pages using AgentCore's managed Chrome browser — no third-party API keys needed:

```python
from synth.deploy.agentcore import BrowserTools
from synth import tool

browser = BrowserTools(region="us-west-2")

@tool
def search_web(query: str) -> str:
    """Search the web for information."""
    return browser.search(query)

@tool
def browse_page(url: str) -> str:
    """Navigate to a URL and extract its content."""
    return browser.navigate(url)

agent = Agent(model="bedrock/claude-sonnet-4-5", tools=[search_web, browse_page])
```

> **Note:** `search_web` uses lightweight HTTP requests (no browser needed). `browse_page` tries HTTP first and falls back to Playwright for JavaScript-heavy pages. Playwright is installed with `pip install synth-agent-sdk[aws]`, but you also need browser binaries: `playwright install chromium`.

### Built-in Web Search (API-based)

For lighter-weight search without a browser session, use the built-in `web_search` tool with a search API key:

```python
from synth.tools import web_search

agent = Agent(model="claude-sonnet-4-5", tools=[web_search])
```

Supports `BRAVE_API_KEY`, `SERPAPI_API_KEY`, or `TAVILY_API_KEY` — auto-detects whichever is set.

For AgentCore deployments, store the key in AWS Secrets Manager or SSM and fetch it at startup (see [Environment Variables in the Container](#environment-variables-in-the-container)).

### AgentCore Memory

Memory is automatically configured when deploying to AgentCore. The adapter wraps your agent with `AgentCoreMemory`, which stores and retrieves conversation history via the AgentCore events API. No manual setup required — just ensure `AGENTCORE_MEMORY_ENDPOINT` and `AGENTCORE_MEMORY_ID` are set in your deployment environment.

```python
# Memory works automatically in AgentCore deployments.
# For explicit configuration:
from synth.deploy.agentcore import AgentCoreMemory

agent = Agent(
    model="bedrock/claude-sonnet-4-5",
    memory=AgentCoreMemory(memory_id="mem-abc123"),
)
```

### SSM Config

```python
from synth.deploy.agentcore import get_ssm_parameter

db_url = get_ssm_parameter("/myapp/prod/db_url")
api_key = get_ssm_parameter("/myapp/prod/api_key", decrypt=True)
```

---

## AgentCore Evaluations

Synth integrates with AgentCore's Evaluations service for continuous agent quality monitoring. When you run `synth init` with the AgentCore provider and enable the "eval" feature, the wizard generates everything you need.

### What Gets Generated

- `eval_config.json` — Online evaluation configuration with three built-in evaluators (Helpfulness, Correctness, GoalSuccessRate) at a 1.0 sampling rate
- `agentcore.yaml` — Updated with an `evaluations` section and the required IAM permissions
- `eval_dataset.json` — Local evaluation dataset (also available for non-AgentCore providers)
- Agent code comment referencing the eval config

### Built-in Evaluators

| Evaluator | Level | What It Measures |
|-----------|-------|-----------------|
| `Builtin.Helpfulness` | TRACE | Whether the agent's response is helpful and relevant |
| `Builtin.Correctness` | TRACE | Factual accuracy of the agent's response |
| `Builtin.GoalSuccessRate` | SESSION | Whether the agent achieved the user's goal |

### Dashboard Integration

When evaluations are configured, the Dashboard's AgentCore tab shows an Evaluations sub-section with:

- Summary table of most recent evaluator scores (scores below 0.5 are flagged)
- Online evaluation config status (active/disabled, sampling rate, evaluator list)
- "Run Evaluation" button for on-demand evaluation against the most recent session

### API Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/agentcore/evaluations` | GET | Fetch evaluation scores |
| `/api/agentcore/evaluations/run` | POST | Trigger on-demand evaluation |
| `/api/agentcore/evaluations/config` | GET | Get evaluation config status |

All evaluation endpoints apply credential scrubbing to response data.

---

## Error Handling

All Synth errors inherit from `SynthError` and include `component` and `suggestion` fields.

| Error | When |
|-------|------|
| `SynthConfigError` | Missing API key, invalid model, missing provider package |
| `ToolDefinitionError` | `@tool` missing type annotations or docstring |
| `ToolExecutionError` | Tool function raised an exception |
| `GuardViolationError` | A guard check failed |
| `CostLimitError` | Cost guard limit exceeded |
| `RateLimitViolationError` | Rate limit guard threshold exceeded |
| `SynthParseError` | Structured output couldn't be parsed after retries |
| `GraphRoutingError` | No edge condition matched at a graph node |
| `GraphLoopError` | Graph exceeded `max_iterations` |
| `RunNotFoundError` | No checkpoint found for the given `run_id` |
| `PipelineError` | A pipeline stage failed |
| `MCPConnectionError` | Failed to connect to an MCP server |
| `MCPToolError` | An MCP tool invocation failed |
| `VCRMismatchError` | VCR replay diverged from recorded conversation |

```python
from synth.errors import SynthConfigError, ToolExecutionError, GuardViolationError

try:
    result = agent.run("Do something risky.")
except GuardViolationError as e:
    print(f"Guard '{e.guard_name}' blocked: {e.remediation}")
except ToolExecutionError as e:
    print(f"Tool '{e.tool_name}' failed: {e.original_error}")
except SynthConfigError as e:
    print(f"Config issue in {e.component}: {e.suggestion}")
```

---

## Environment Variables

| Variable | Purpose | Required? |
|----------|---------|-----------|
| `ANTHROPIC_API_KEY` | Anthropic Claude API key | Only for `claude-*` models |
| `OPENAI_API_KEY` | OpenAI GPT API key | Only for `gpt-*` models |
| `GOOGLE_API_KEY` | Google Gemini API key | Only for `gemini-*` models |
| `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` | AWS credentials for Bedrock | Only for `bedrock/*` (or use IAM) |
| `SYNTH_TRACE_ENDPOINT` | HTTPS URL of an OTel collector | No |
| `SYNTH_NO_BANNER` | Set to `1` to skip the boot sequence | No |
| `NO_COLOR` | Disable colored terminal output | No |

---

## FAQ

**Do I need an API key?**
Yes, for cloud models. Ollama runs locally and needs no key.

**Can I use Synth in Jupyter?**
Yes. Synth detects an existing event loop and handles it automatically.

**How do I switch models?**
Change the `model` string. Install the matching extra and set the API key.

**What if the provider is down?**
Synth retries on HTTP 429 and 5xx with exponential backoff. Configure with `max_retries` and `retry_backoff`. For automatic failover, use `fallback=["gpt-4o", "claude-haiku-3-5"]` to try alternative models.

**Can I use multiple models in one app?**
Yes. Each `Agent` has its own model. Use `synth init` with the `multi` project type to scaffold a multi-agent project with orchestration built in. Use `AgentTool` to compose agents hierarchically.

**How do I test my agent without API keys?**
Use `TestModel` for deterministic unit tests, `FunctionModel` for custom test logic, or `VCRRecorder` to replay recorded interactions. All available from `synth.testing`.

**How do I connect to MCP servers?**
Use `MCPClient` with a URL or command: `mcp = MCPClient("https://mcp.example.com"); await mcp.connect(); agent = Agent(tools=[mcp])`. Install with `pip install synth-agent-sdk[mcp]`.

**How do I run graph nodes in parallel?**
Add multiple unconditional edges from the same source node. Synth auto-detects the fan-out and runs targets concurrently with isolated state copies.

**How do I test my agent in a browser?**
Run `synth create ui my-dashboard` or choose `ui` as the testing mode during `synth init`. This gives you a full dashboard with streaming chat, telemetry, prompt library, A/B testing, evals, session replay, and scenario builder at `http://localhost:8420`. For multi-agent projects, the dashboard auto-detects your orchestration pattern and shows real-time agent delegation with per-agent tool calls, output, and cost.

**How do I debug what my agent is doing?**
Use `result.trace.show()` for a visual timeline, or `synth dev my_agent.py` for an interactive terminal UI with `/trace` command.

**Is my data secure?**
Synth never logs or serializes API keys. Guards run before side-effecting operations. Checkpoints use JSON only. All provider calls use HTTPS.

**What are the core dependencies?**
`pydantic`, `httpx`, `click`, `typing-extensions`, `rich`, `prompt-toolkit`. Provider SDKs are optional extras.

---

## License

MIT
