Metadata-Version: 2.4
Name: hookedllm
Version: 0.2.1
Summary: Async-first, scoped hook system for LLM observability
Author-email: Michael Karotsieris <michael.karotsieris@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/CepstrumLabs/hookedllm
Project-URL: Documentation, https://cepstrumlabs.github.io/hookedllm/
Project-URL: Repository, https://github.com/CepstrumLabs/hookedllm
Project-URL: Issues, https://github.com/CepstrumLabs/hookedllm/issues
Keywords: llm,hooks,observability,openai,monitoring,evaluation
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: openai
Requires-Dist: openai>=1.0.0; extra == "openai"
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.18.0; extra == "anthropic"
Provides-Extra: config
Requires-Dist: pyyaml>=6.0; extra == "config"
Provides-Extra: all
Requires-Dist: openai>=1.0.0; extra == "all"
Requires-Dist: anthropic>=0.18.0; extra == "all"
Requires-Dist: pyyaml>=6.0; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: isort>=5.12.0; extra == "dev"
Requires-Dist: types-pyyaml>=6.0.12.20250915; extra == "dev"
Provides-Extra: docs
Requires-Dist: mkdocs>=1.5.0; extra == "docs"
Requires-Dist: mkdocs-material>=9.0.0; extra == "docs"
Requires-Dist: mkdocstrings[python]>=0.23.0; extra == "docs"
Requires-Dist: mkdocs-gen-files>=0.5.0; extra == "docs"
Dynamic: license-file

# HookedLLM

**Async-first, scoped hook system for LLM observability with SOLID/DI architecture**

[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Documentation](https://img.shields.io/badge/docs-latest-blue)](https://cepstrumlabs.github.io/hookedllm/)

HookedLLM provides transparent observability for LLM calls through a powerful hook system. Add evaluation, logging, metrics, and custom behaviors to your LLM applications without modifying core application logic.

## ✨ Key Features

- **🎯 Scoped Isolation**: Named scopes prevent hook interference across application contexts
- **🔧 SOLID/DI Compliant**: Full dependency injection support for testing and customization
- **📦 Minimal Surface**: Single import, simple API: `import hookedllm`
- **⚡ Async-First**: Built for modern async LLM SDKs
- **🎨 Type-Safe**: Full type hints and IDE autocomplete support
- **🛡️ Resilient**: Hook failures never break your LLM calls
- **🔀 Conditional Execution**: Run hooks only when rules match (model, tags, metadata)
- **⚙️ Config or Code**: Define hooks programmatically or via YAML

## 🚀 Quick Start

### Installation

```bash
# Core package (zero dependencies)
pip install hookedllm

# With OpenAI support
pip install hookedllm[openai]

# With Anthropic/Claude support
pip install hookedllm[anthropic]

# With both OpenAI and Anthropic support
pip install hookedllm[openai,anthropic]

# With all optional dependencies (OpenAI, Anthropic, config support)
pip install hookedllm[all]
```

### Basic Usage

**With OpenAI:**
```python
import hookedllm
from openai import AsyncOpenAI

# Define a simple hook
async def log_usage(call_input, call_output, context):
    print(f"Model: {call_input.model}")
    print(f"Tokens: {call_output.usage.get('total_tokens', 0)}")

# Register hook to a scope
hookedllm.scope("evaluation").after(log_usage)

# Wrap your client with the scope
client = hookedllm.wrap(AsyncOpenAI(), scope="evaluation")

# Use normally - hooks execute automatically!
response = await client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)
```

**With Anthropic/Claude:**
```python
import hookedllm
from anthropic import AsyncAnthropic

# Same hook works for both providers!
async def log_usage(call_input, call_output, context):
    print(f"Provider: {context.provider}, Model: {call_input.model}")
    if call_output.usage:
        total = call_output.usage.get("total_tokens", 0)
        print(f"Tokens: {total}")

# Register hook
hookedllm.scope("evaluation").after(log_usage)

# Wrap Anthropic client - automatic provider detection!
client = hookedllm.wrap(AsyncAnthropic(), scope="evaluation")

# Use normally - hooks execute automatically!
response = await client.messages.create(
    model="claude-3-haiku-20240307",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
    metadata={"hookedllm_tags": ["example"]}  # Note: Anthropic uses metadata, not extra_body
)
```

## 📖 Examples

Explore the [`examples/`](examples/) directory for complete, runnable demonstrations:

### Getting Started
- **[`simple_demo.py`](examples/simple_demo.py)** - Your first hookedllm program
  - Complete working example with real LLM calls
  - Automatic metrics tracking with `MetricsHook`
  - Response evaluation with `EvaluationHook`
  - Perfect starting point for new users

- **[`basic_usage.py`](examples/basic_usage.py)** - Core concepts walkthrough
  - Simple hook registration
  - Scoped vs global hooks
  - Conditional rules with `when`
  - Multiple scope usage

### Advanced Features
- **[`global_hooks_demo.py`](examples/global_hooks_demo.py)** - Global hooks in action
  - 5 different LLM calls with global before/after hooks
  - Shows all data provided by the framework
  - Demonstrates hook execution flow
  - Metrics aggregation across calls

- **[`scopes_demo.py`](examples/scopes_demo.py)** - Scope isolation deep dive
  - Prevents hook interference across contexts
  - Development vs production vs evaluation scopes
  - Multi-scope client usage
  - Real-world use case examples

- **[`evaluation_and_metrics.py`](examples/evaluation_and_metrics.py)** - Built-in helpers
  - Using `MetricsHook` for automatic tracking
  - Using `EvaluationHook` for quality scoring
  - Conditional evaluation (only for specific models)
  - Multiple scope combinations

### Integrations
- **[`integrations/langfuse_integration.py`](examples/integrations/langfuse_integration.py)** - Langfuse observability
  - Automatic trace and generation tracking
  - Token usage and cost monitoring
  - Error tracking with full context
  - Metadata enrichment

- **[`integrations/opentelemetry_integration.py`](examples/integrations/opentelemetry_integration.py)** - OpenTelemetry tracing
  - Distributed tracing for LLM calls
  - Semantic conventions for LLM observability
  - Span creation with attributes and events
  - Integration with existing OTel infrastructure

### Running the Examples

```bash
# Install with OpenAI support
pip install -e .[openai]

# Or install with Anthropic support
pip install -e .[anthropic]

# Or install with both
pip install -e .[openai,anthropic]

# Set your API keys
export OPENAI_API_KEY=your-key-here
export ANTHROPIC_API_KEY=your-key-here

# Run any example
python examples/simple_demo.py
python examples/scopes_demo.py
python examples/anthropic_simple_example.py  # Anthropic example
python examples/integrations/langfuse_integration.py
```

Each example includes:
- ✅ Complete, runnable code
- 📝 Detailed inline comments
- 🚀 Setup instructions
- 💡 Real-world use cases
- 🎯 Best practices

## 📚 Core Concepts

### Scopes

Scopes isolate hooks to specific parts of your application:

```python
# Evaluation scope
hookedllm.scope("evaluation").after(evaluate_response)
hookedllm.scope("evaluation").after(calculate_metrics)

# Production scope
hookedllm.scope("production").after(production_logger)
hookedllm.scope("production").error(alert_on_error)

# Clients opt into scopes
eval_client = hookedllm.wrap(AsyncOpenAI(), scope="evaluation")
prod_client = hookedllm.wrap(AsyncOpenAI(), scope="production")

# Each client only runs its scope's hooks - no interference!
```

### Hook Types

Four hook types cover the entire call lifecycle:

```python
# Before: runs before LLM call
async def before_hook(call_input, context):
    context.metadata["user_id"] = "abc123"

# After: runs after successful call
async def after_hook(call_input, call_output, context):
    print(f"Response: {call_output.text}")

# Error: runs on failure
async def error_hook(call_input, error, context):
    print(f"Error: {error}")

# Finally: always runs with complete result
async def finally_hook(result):
    print(f"Took {result.elapsed_ms}ms")

hookedllm.before(before_hook)
hookedllm.after(after_hook)
hookedllm.error(error_hook)
hookedllm.finally_(finally_hook)
```

### Conditional Rules

Execute hooks only when conditions match:

```python
# Only for GPT-4
hookedllm.scope("evaluation").after(
    expensive_eval,
    when=hookedllm.when.model("gpt-4")
)

# Only in production
hookedllm.after(
    prod_logger,
    when=hookedllm.when.tag("production")
)

# Complex rules with composition
hookedllm.after(
    my_hook,
    when=(
        hookedllm.when.model("gpt-4") &
        hookedllm.when.tag("production") &
        ~hookedllm.when.tag("test")
    )
)

# Custom predicates
hookedllm.after(
    premium_hook,
    when=lambda call_input, ctx: ctx.metadata.get("tier") == "premium"
)
```

### Global + Scoped Hooks

Combine global hooks (run everywhere) with scoped hooks:

```python
# Global hook - runs for ALL clients
hookedllm.finally_(track_all_metrics)

# Scoped hooks - only for specific clients
hookedllm.scope("evaluation").after(evaluate)
hookedllm.scope("production").error(alert)

# Evaluation client gets: track_all_metrics + evaluate
eval_client = hookedllm.wrap(AsyncOpenAI(), scope="evaluation")

# Production client gets: track_all_metrics + alert
prod_client = hookedllm.wrap(AsyncOpenAI(), scope="production")
```

### Multiple Scopes

Clients can use multiple scopes:

```python
hookedllm.scope("logging").finally_(log_call)
hookedllm.scope("metrics").finally_(track_metrics)
hookedllm.scope("evaluation").after(evaluate)

# Client with all three scopes
client = hookedllm.wrap(
    AsyncOpenAI(),
    scope=["logging", "metrics", "evaluation"]
)

# Runs: log_call + track_metrics + evaluate
```

## 🧪 Testing with Dependency Injection

HookedLLM is fully testable through dependency injection:

```python
import hookedllm
from unittest.mock import Mock

def test_hook_execution():
    # Create mock dependencies
    mock_registry = Mock(spec=hookedllm.ScopeRegistry)
    mock_executor = Mock(spec=hookedllm.HookExecutor)
    
    # Configure mocks
    mock_scope = Mock()
    mock_registry.get_scopes_for_client.return_value = [mock_scope]
    
    # Create context with mocks
    ctx = hookedllm.create_context(
        registry=mock_registry,
        executor=mock_executor
    )
    
    # Test
    ctx.scope("test").after(my_hook)
    client = ctx.wrap(FakeClient(), scope="test")
    
    # Assert
    assert mock_executor.execute_after.called
```

## 🏗️ Architecture

HookedLLM follows SOLID principles with full dependency injection:

- **Single Responsibility**: Separate storage, execution, and registry
- **Dependency Inversion**: Depends on Protocol abstractions
- **Liskov Substitution**: Any implementation of protocols works
- **Interface Segregation**: Focused, minimal interfaces
- **Open/Closed**: Extend via hooks and rules without modifying core

See [`ARCHITECTURE.md`](ARCHITECTURE.md) for detailed design documentation.

## 📖 Advanced Usage

### Custom Error Handling

```python
def my_error_handler(error, context):
    # Custom handling for hook errors
    logger.error(f"Hook failed in {context}: {error}")

executor = hookedllm.DefaultHookExecutor(
    error_handler=my_error_handler,
    logger=my_logger
)

ctx = hookedllm.create_context(executor=executor)
client = ctx.wrap(AsyncOpenAI())
```

### Evaluation Hook Example

```python
async def evaluate_response(call_input, call_output, context):
    """Evaluate LLM responses for quality."""
    # Build evaluation prompt
    eval_prompt = f"""
    Evaluate this response for clarity and accuracy:
    
    Query: {call_input.messages[-1].content}
    Response: {call_output.text}
    
    Return JSON: {{"clarity": 0-1, "accuracy": 0-1}}
    """
    
    # Use separate evaluator client (no hooks to avoid recursion)
    evaluator = AsyncOpenAI()
    eval_result = await evaluator.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": eval_prompt}]
    )
    
    # Store evaluation in metadata
    context.metadata["evaluation"] = eval_result.choices[0].message.content

# Register to evaluation scope
hookedllm.scope("evaluation").after(evaluate_response)
```

### Metrics Collection

```python
metrics = {"calls": 0, "tokens": 0, "errors": 0}

async def track_metrics(result):
    """Track aggregated metrics."""
    metrics["calls"] += 1
    
    if result.error:
        metrics["errors"] += 1
    
    if result.output and result.output.usage:
        metrics["tokens"] += result.output.usage.get("total_tokens", 0)

hookedllm.finally_(track_metrics)
```

### Tags and Metadata

Pass tags and metadata to enable conditional hooks:

**OpenAI** (uses `extra_body`):
```python
response = await client.chat.completions.create(
    model="gpt-4",
    messages=[...],
    extra_body={
        "hookedllm_tags": ["production", "critical"],
        "hookedllm_metadata": {
            "user_id": "abc123",
            "user_tier": "premium"
        }
    }
)
```

**Anthropic** (uses `metadata`):
```python
response = await client.messages.create(
    model="claude-3-haiku-20240307",
    messages=[...],
    metadata={
        "hookedllm_tags": ["production", "critical"],
        "hookedllm_metadata": {
            "user_id": "abc123",
            "user_tier": "premium"
        }
    }
)
```

## 🤝 Contributing

Contributions welcome! Please see our [Contributing Guidelines](CONTRIBUTING.md) and [Code of Conduct](CODE_OF_CONDUCT.md).

## 📄 License

MIT License - see [LICENSE](LICENSE) file for details.

## 🔒 Security

Please see [SECURITY.md](SECURITY.md) for security policy and reporting vulnerabilities.

## 🙏 Acknowledgments

Built with inspiration from middleware patterns, aspect-oriented programming, and functional composition principles.
