Metadata-Version: 2.4
Name: llm-json-streaming
Version: 0.1.0
Summary: A unified interface for streaming structured JSON from OpenAI, Anthropic, and Google Gemini.
Project-URL: Homepage, https://github.com/yourusername/llm-json-streaming
Project-URL: Bug Tracker, https://github.com/yourusername/llm-json-streaming/issues
Project-URL: Documentation, https://github.com/yourusername/llm-json-streaming#readme
Project-URL: Source Code, https://github.com/yourusername/llm-json-streaming
Author-email: Daniel Wu <edanielwu@gmail.com>
License-File: LICENSE
Keywords: anthropic,gemini,json,llm,openai,pydantic,streaming,structured-outputs
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Requires-Dist: anthropic
Requires-Dist: google-genai
Requires-Dist: json-repair>=0.30.0
Requires-Dist: openai>=2.0.0
Requires-Dist: pydantic>=2.0
Requires-Dist: python-dotenv
Provides-Extra: dev
Requires-Dist: black>=23.0.0; extra == 'dev'
Requires-Dist: isort>=5.0.0; extra == 'dev'
Requires-Dist: mypy>=1.0.0; extra == 'dev'
Requires-Dist: pre-commit>=3.0.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Description-Content-Type: text/markdown

# LLM JSON Streaming

A unified Python library for streaming structured JSON outputs from OpenAI, Anthropic (Claude), and Google Gemini.

This library abstracts the differences between providers' structured output APIs and provides a consistent interface to stream JSON data and parsed Pydantic objects.

## Features

- **Unified Interface**: Use a single API to interact with OpenAI, Anthropic, and Google Gemini.
- **JSON Streaming**: Access raw JSON chunks as they are generated (`delta`).
- **Structured Outputs**: Enforce schema validation using Pydantic models.
- **Partial Parsing**: Access accumulated JSON strings during streaming.
- **Claude Structured Outputs**: Automatically upgrades Claude Sonnet 4.5 / Opus 4.1 requests to Anthropic's JSON outputs for guaranteed schemas.
- **Claude Prefill Strategy**: Older Claude models avoid tool calls entirely—schema-aware prefilling keeps responses JSON-only while still streaming deltas. Includes JSON repair for partial object support.
- **Google Gemini Support**: Native structured outputs with JSON repair for enhanced partial object support.

## Installation

This project is managed with `uv`.

```bash
# Clone the repository
git clone https://github.com/yourusername/llm-json-streaming.git
cd llm-json-streaming

# Install dependencies
uv sync
```

## Configuration

Set your API keys in a `.env` file:

```ini
OPENAI_API_KEY=your_openai_api_key
OPENAI_BASE_URL=https://api.openai.com/v1

ANTHROPIC_API_KEY=your_anthropic_api_key
ANTHROPIC_BASE_URL=https://api.anthropic.com

GEMINI_API_KEY=your_gemini_api_key
GOOGLE_BASE_URL=https://generativelanguage.googleapis.com  # Optional
```

## Usage

Define your output schema using Pydantic and pass it to the provider.

```python
import asyncio
from pydantic import BaseModel
from llm_json_streaming import create_provider

# 1. Define your schema
class UserProfile(BaseModel):
    name: str
    age: int
    bio: str

async def main():
    # 2. Initialize provider using the factory
    # Available: "openai", "anthropic", "claude", "google"
    # Ensure environment variables are set, or pass api_key="..."
    try:
        # For Anthropic, you can optionally specify mode:
        # provider = create_provider("anthropic", mode="structured")  # Force structured outputs
        # provider = create_provider("anthropic", mode="prefill")     # Force prefill mode
        # provider = create_provider("anthropic", mode="auto")        # Auto-detect (default)
        provider = create_provider("openai")
    except ValueError as e:
        print(e)
        return

    prompt = "Generate a profile for a fictional software engineer."

    # 3. Stream results
    print("Streaming JSON...")
    try:
        async for chunk in provider.stream_json(prompt, UserProfile):
            # Real-time partial parsed object (recommended for streaming updates)
            if "partial_object" in chunk:
                # Display the current best partial/complete parsed object
                user_profile = chunk["partial_object"]
                print(f"\rCurrent: {user_profile.name if user_profile.name else '...'}, {user_profile.age if user_profile.age else '?'}", end="", flush=True)

            # Raw text delta (for character-by-character display)
            if "delta" in chunk:
                print(chunk["delta"], end="", flush=True)

            # Final parsed object (complete and validated)
            if "final_object" in chunk:
                user_profile = chunk["final_object"]
                print(f"\n\nComplete: {user_profile.name}, {user_profile.age}")
    except Exception as e:
        print(f"\nError during streaming: {e}")

if __name__ == "__main__":
    asyncio.run(main())
```

## Streaming Interface

The `stream_json()` method yields dictionaries with different types of content during streaming:

### Chunk Fields

- **`partial_object`**: The current best parsed object. Available from the beginning of streaming in all modes:
  - **Early stage**: Returns partial dictionaries for incomplete JSON
  - **Later stage**: Returns validated Pydantic model instances for complete/repairable JSON
- **`delta`**: Raw text characters as they are generated by the LLM.
- **`final_object`**: The complete, validated Pydantic object when streaming finishes.
- **`partial_json`**: The current accumulated JSON text string.
- **`final_json`**: The complete JSON text string when streaming finishes.

### Recommended Usage Pattern

```python
async for chunk in provider.stream_json(prompt, UserProfile):
    # Use partial_object for real-time updates (recommended)
    if "partial_object" in chunk:
        user_profile = chunk["partial_object"]
        # Available from the beginning - starts as dict, becomes Pydantic object
        # Handle both types gracefully for consistent UI updates
        if hasattr(user_profile, 'model_dump'):
            # Pydantic model (complete/repairable JSON)
            name = user_profile.name or "..."
        else:
            # Dictionary (incomplete JSON)
            name = user_profile.get('name', "...")

        update_ui(name)  # Update UI with current best data

    # Use final_object for the final result
    if "final_object" in chunk:
        final_profile = chunk["final_object"]
        # Process the complete validated object
        save_result(final_profile)
```

## Supported Providers & Models

| Provider | Default Model | Method Used |
|----------|---------------|-------------|
| OpenAI   | `gpt-4o-2024-08-06` | `response_format` (Structured Outputs) via `beta.chat.completions` |
| Anthropic   | `claude-3-5-sonnet-20240620` (auto-switches to Structured Outputs for `claude-sonnet-4.5*` / `claude-opus-4.1*`) | Prefill JSON streaming for legacy models, Structured Outputs (`output_format` + beta header) for Sonnet 4.5 / Opus 4.1 |
| Google   | `gemini-2.5-flash` | `response_mime_type="application/json"` with structured outputs via Google GenAI SDK |

### Anthropic Mode Configuration

You can configure which strategy Anthropic models use through multiple methods:

#### Method 1: Constructor Mode (Recommended)

```python
from llm_json_streaming import create_provider

# Force structured outputs mode
provider = create_provider("anthropic", mode="structured")

# Force prefill mode
provider = create_provider("anthropic", mode="prefill")

# Auto-detection based on model (default)
provider = create_provider("anthropic", mode="auto")
```

#### Method 2: Method Parameter Override

```python
# Temporary override per request
async for chunk in provider.stream_json(prompt, UserProfile,
                                       model="claude-3-5-sonnet-20240620",
                                       use_structured_outputs=True):
    # Uses structured outputs regardless of auto-detection
```

#### Mode Priority

1. **Constructor mode** (`mode=` parameter) - Highest priority
2. **Method parameter** (`use_structured_outputs=`) - Medium priority
3. **Auto-detection** - Based on model capabilities - Lowest priority

### Anthropic Structured Outputs

Claude Sonnet 4.5 and Claude Opus 4.1 support Anthropic's structured output beta.
When using structured mode, chunks include partial JSON text and final Pydantic objects automatically.

### Anthropic Prefill Mode

All other Claude models receive schema-derived instructions and an assistant prefill (e.g., `{` or `{"field":`) so they skip generic preambles and stream JSON directly—no tool definitions or tool-use deltas required.

Enhanced with multi-level partial object support:
- **Real-time partial objects**: Available from the first token, even with incomplete JSON
- **Progressive improvement**: Starts with partial dictionaries, upgrades to Pydantic objects when JSON becomes complete
- **JSON repair**: Automatically fixes incomplete JSON to enable better partial parsing
- **Consistent interface**: Behaves like structured outputs while maintaining backward compatibility

### Google Gemini Support

Google Gemini models use the Google GenAI SDK with native structured outputs:

```python
from llm_json_streaming import create_provider

provider = create_provider("google")
async for chunk in provider.stream_json(prompt, UserProfile, model="gemini-2.5-flash"):
    # Handle streaming chunks
    if "partial_object" in chunk:
        print(chunk["partial_object"])
```

**Key Features:**
- **Native Structured Outputs**: Uses `response_mime_type="application/json"` for guaranteed JSON responses
- **JSON Repair**: Automatic repair of incomplete JSON for enhanced partial object support
- **Schema Validation**: Direct Pydantic schema integration for type-safe responses
- **Streaming**: Real-time partial objects with progressive enhancement

**Configuration:**
- Set `GEMINI_API_KEY` environment variable (required)
- Optionally set `GOOGLE_BASE_URL` for custom endpoints
- Default model: `gemini-2.5-flash`

## Testing

To run the tests with `uv`:

```bash
uv run pytest
```

## Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

## License

[MIT](LICENSE)
