Metadata-Version: 2.4
Name: thinllm
Version: 0.1.5a1
Summary: Add your description here
Author-email: Ujjwal Kumar <kujjwal02@gmail.com>
License-File: LICENSE
Requires-Python: >=3.10
Requires-Dist: jiter>=0.8.2
Requires-Dist: pydantic>=2.12.5
Description-Content-Type: text/markdown

# ThinLLM

A thin, unified wrapper for LLM interactions with support for multiple providers (OpenAI, Azure OpenAI, Anthropic, AWS Bedrock, and Gemini).

> **⚠️ Under Development**: This project is currently under active development. APIs may change, and some features may be incomplete or subject to modification.

## Table of Contents

- [Features](#features)
- [Why ThinLLM?](#why-thinllm)
- [Key Concepts](#key-concepts)
- [Installation](#installation)
- [Quick Start](#quick-start)
  - [Basic Usage](#basic-usage)
  - [Streaming Responses](#streaming-responses)
  - [Structured Output](#structured-output-with-pydantic)
  - [Function Calling](#function-calling--tools)
- [Providers](#providers)
  - [OpenAI](#openai)
  - [Azure OpenAI](#azure-openai)
  - [Anthropic](#anthropic-claude)
  - [AWS Bedrock](#aws-bedrock-anthropic-models)
  - [Google Gemini](#google-gemini)
- [API Reference](#api-reference)
- [Examples](#examples)
- [Troubleshooting](#troubleshooting)
- [Roadmap](#roadmap)
- [Contributing](#contributing)
- [License](#license)

## Features

- **Single Function API**: One `llm()` function for all providers - no need to learn multiple APIs
- **Provider Support**: OpenAI, Azure OpenAI, Anthropic (Claude), AWS Bedrock, and Google Gemini
- **Streaming**: Full support for streaming responses
- **Structured Output**: Get Pydantic models directly from LLMs with type safety
- **Function Calling**: Tool/function calling with automatic serialization
- **Type Safety**: Full type hints and runtime validation with Pydantic

## Why ThinLLM?

**One function. All providers. Zero hassle.**

Building applications with multiple LLM providers means learning different APIs, handling different response formats, and managing provider-specific quirks. **ThinLLM** eliminates this complexity:

- **Single `llm()` Function**: One function for all your LLM needs - no need to learn provider-specific APIs
- **Write Once, Use Anywhere**: Same code works with OpenAI, Claude, Bedrock, and Gemini
- **Provider Agnostic**: Switch providers by changing just the config - no code refactoring needed
- **Type Safety**: Full Pydantic integration for validated, structured outputs
- **Minimal Overhead**: Thin wrapper that stays close to native provider APIs
- **Production Ready**: Battle-tested with comprehensive test coverage

```python
# Switch providers by updating the config
config = LLMConfig(
    provider="anthropic",      # Was "openai"
    model_id="claude-sonnet-4", # Was "gpt-4"
    params=ModelParams(temperature=0.7)
)
# Same llm() function, same messages, same code structure!
response = llm(config, messages)
```

## Key Concepts

**ThinLLM** is built around a single powerful principle: **one function for all your LLM needs**.

Instead of learning different APIs for OpenAI, Anthropic, Bedrock, and Gemini, you just use the `llm()` function. Change providers by switching one configuration parameter - everything else stays the same.

### Core Components

1. **`llm()` function**: The only function you need - handles all LLM interactions
2. **`LLMConfig`**: Configure which provider and model to use
3. **`ModelParams`**: Standard parameters (temperature, max_tokens, etc.) that work across all providers
4. **Messages**: Use `SystemMessage`, `UserMessage`, and `AIMessage` to build conversations
5. **Structured Output**: Pass a Pydantic model as `output_schema` to get validated, typed responses
6. **Tools**: Pass Python functions as `tools` parameter for function calling

## Installation

### Basic Installation

```bash
pip install thinllm
```

### Provider-Specific Dependencies

Install dependencies for the providers you want to use:

```bash
# For OpenAI
pip install thinllm[openai]

# For Azure OpenAI
pip install thinllm[azure-openai]

# For Anthropic (Claude)
pip install thinllm[anthropic]

# For AWS Bedrock with Anthropic models
pip install thinllm[bedrock]

# For Google Gemini
pip install thinllm[gemini]

# For all providers
pip install thinllm[all]
```

### Environment Setup

Set your API keys as environment variables or use a `.env` file:

```bash
export OPENAI_API_KEY=your-openai-key-here
export ANTHROPIC_API_KEY=your-anthropic-key-here
export GEMINI_API_KEY=your-gemini-key-here

# Azure OpenAI credentials
export AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
export AZURE_OPENAI_API_KEY=your-azure-api-key  # Optional if using Entra ID

# AWS credentials (for Bedrock)
export AWS_ACCESS_KEY_ID=your-aws-access-key
export AWS_SECRET_ACCESS_KEY=your-aws-secret-key
export AWS_REGION=us-east-1
```

Or create a `.env` file in your project:

```env
OPENAI_API_KEY=your-openai-key-here
ANTHROPIC_API_KEY=your-anthropic-key-here
GEMINI_API_KEY=your-gemini-key-here
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_API_KEY=your-azure-api-key
AWS_ACCESS_KEY_ID=your-aws-access-key
AWS_SECRET_ACCESS_KEY=your-aws-secret-key
AWS_REGION=us-east-1
```

## Quick Start

### Basic Usage

Here's how to make your first LLM call:

```python
from thinllm import llm, LLMConfig, ModelParams, UserMessage
from dotenv import load_dotenv

# Load your API keys
load_dotenv()

# Configure your LLM
config = LLMConfig(
    provider="openai",  # or "azure_openai", "anthropic", "bedrock_anthropic", "gemini"
    model_id="gpt-4",
    params=ModelParams(temperature=0.7, max_output_tokens=1024)
)

# Ask a question
messages = [UserMessage(content="What is the capital of France?")]
response = llm(config, messages)

# Print the response
print(response.content[0].text)
# Output: The capital of France is Paris.
```

### Streaming Responses

Stream responses in real-time for better user experience:

```python
# Stream the response - each chunk contains the complete response up to that point
for chunk in llm(config, messages, stream=True):
    if chunk.content:
        print(chunk.content[0].text)
        # Note: chunk.content[0].text contains the full text generated so far, not just the diff
```

### Structured Output with Pydantic

Get validated, structured data directly from the LLM:

```python
from pydantic import BaseModel
from thinllm import SystemMessage

class CalendarEvent(BaseModel):
    thought: str | None = None
    name: str | None = None
    date: str | None = None
    participants: list[str] | None = None

messages = [
    SystemMessage(content="Extract the event information."),
    UserMessage(content="Alice and Bob are going to a science fair on Friday."),
]

# Get structured output that matches your Pydantic model
response = llm(config, messages, output_schema=CalendarEvent)

print(f"Event: {response.name}")
print(f"Date: {response.date}")
print(f"Participants: {', '.join(response.participants)}")
```

### Function Calling / Tools

Enable the LLM to request function calls:

```python
from thinllm import SystemMessage

# Define a simple function
def get_horoscope(sign: str):
    """Get the horoscope for a zodiac sign."""
    return f"{sign}: Next Tuesday you will befriend a baby otter."

messages = [
    SystemMessage(content="You are a helpful assistant."),
    UserMessage(content="What is my horoscope? I am an Aquarius."),
]

# The LLM will indicate which function to call
response = llm(config, messages, tools=[get_horoscope])

# Check if the LLM requested a tool call
if response.get_tool_call_contents():
    tool_call = response.get_tool_call_contents()[0]
    print(f"LLM requested function: {tool_call.name}")
    print(f"With arguments: {tool_call.input}")
    
    # Execute the tool and continue the conversation
    messages.append(response)
    messages.append(
        UserMessage(content=[tool_call.get_tool_result(tools=[get_horoscope])])
    )
    
    # Get the final response after tool execution
    final_response = llm(config, messages, tools=[get_horoscope])
    print(final_response.content[0].text)
```

## Providers

### OpenAI

```python
from thinllm import LLMConfig, ModelParams

config = LLMConfig(
    provider="openai",
    model_id="gpt-4",
    params=ModelParams(
        temperature=0.7,
        max_output_tokens=4096
    )
)
```

**Supported Models**: `gpt-4`, `gpt-4-turbo`, `gpt-3.5-turbo`, `gpt-4o`, etc.

**Setup**: Set `OPENAI_API_KEY` environment variable

### Azure OpenAI

Use OpenAI models through Azure's enterprise platform with enhanced security and compliance:

```python
from thinllm import Credentials

# With API Key authentication
config = LLMConfig(
    provider="azure_openai",
    model_id="gpt-4o",  # Your deployment name in Azure
    params=ModelParams(
        temperature=0.7,
        max_output_tokens=4096
    ),
    credentials=Credentials(
        azure_endpoint="https://your-resource.openai.azure.com",
        api_key="your-azure-api-key"
    )
)

# With Microsoft Entra ID (Azure AD) authentication
config = LLMConfig(
    provider="azure_openai",
    model_id="gpt-4o",  # Your deployment name in Azure
    params=ModelParams(
        temperature=0.7,
        max_output_tokens=4096
    ),
    credentials=Credentials(
        azure_endpoint="https://your-resource.openai.azure.com",
        # No api_key - will use DefaultAzureCredential
    )
)
```

**Supported Models**: Any OpenAI model deployed in your Azure OpenAI resource
- `gpt-4o`, `gpt-4o-mini`, `gpt-4-turbo`, `gpt-4`, `gpt-3.5-turbo`
- Note: Use your Azure deployment name as `model_id`

**Setup**: 
1. Create an Azure OpenAI resource in Azure Portal
2. Deploy a model and note the deployment name
3. Get your endpoint URL from the resource
4. Choose authentication:
   - **API Key**: Get from "Keys and Endpoint" section in Azure Portal
   - **Microsoft Entra ID**: Run `az login` or configure managed identity

**Environment Variables**:
```bash
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com"
export AZURE_OPENAI_API_KEY="your-api-key"  # Optional if using Entra ID
```

**Authentication Options**:
- **API Key**: Simple, good for development and testing
- **Microsoft Entra ID**: Enterprise-grade, recommended for production
  - Uses `DefaultAzureCredential` from `azure-identity`
  - Supports: Azure CLI, managed identities, environment variables, and more

### Anthropic (Claude)

```python
config = LLMConfig(
    provider="anthropic",
    model_id="claude-sonnet-4",
    params=ModelParams(
        temperature=0.7,
        max_output_tokens=4096
    )
)
```

**Supported Models**: `claude-sonnet-4`, `claude-opus-4`, `claude-3-5-sonnet-20241022`, etc.

**Setup**: Set `ANTHROPIC_API_KEY` environment variable

#### Anthropic-Specific Features

**Prompt Caching** (Reduce costs and latency):

Anthropic's [prompt caching](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching) allows you to cache frequently used content blocks (like system prompts, large documents, or context) to reduce costs and improve response times.

```python
from thinllm import (
    llm, LLMConfig, SystemMessage, UserMessage,
    InputTextBlock, ContentExtra, AnthropicCacheControl
)

# System prompt with caching
system_msg = SystemMessage(content=[
    InputTextBlock(text="You are a helpful assistant."),
    InputTextBlock(
        text="Here is a large document to reference: ...",
        extra=ContentExtra(
            # Cache with default TTL (5 minutes)
            anthropic_cache_control=AnthropicCacheControl()
        )
    )
])

# User message with cached context
user_msg = UserMessage(content=[
    InputTextBlock(
        text="Long context that will be reused across multiple requests...",
        extra=ContentExtra(
            # Cache for 1 hour
            anthropic_cache_control=AnthropicCacheControl(ttl="1h")
        )
    ),
    InputTextBlock(text="What is the main theme?")
])

config = LLMConfig(
    provider="anthropic",
    model_id="claude-sonnet-4",
    params=ModelParams(max_output_tokens=1024)
)

response = llm(config, [system_msg, user_msg])
```

**Cache Control Options:**
- `ttl`: Time-to-live for cache - `"5m"` (5 minutes, default) or `"1h"` (1 hour)
- `enabled`: Set to `False` to explicitly disable caching (default: `True`)
- Omit `ttl` to use Anthropic's default TTL (5 minutes)

**Best Practices:**
- Cache large, reusable context (system prompts, documents, examples)
- Use longer TTL (`"1h"`) for stable content
- Cache control is ignored by other providers (Anthropic-only feature)
- Not supported for reasoning/thinking content blocks

**Extended Thinking with Interleaved Thinking**:

Anthropic's [extended thinking](https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking) allows Claude to reason before responding. With **interleaved thinking**, Claude can think between tool calls for more sophisticated multi-step reasoning.

```python
from thinllm import llm, LLMConfig, ModelParams, ThinkingConfig, UserMessage

def get_weather(location: str) -> str:
    """Get weather for a location."""
    return f"Weather in {location}: Sunny, 72°F"

def get_time(timezone: str) -> str:
    """Get time for a timezone."""
    return f"Time in {timezone}: 2:30 PM"

# Configure with interleaved thinking
config = LLMConfig(
    provider="anthropic",
    model_id="claude-sonnet-4-5",
    params=ModelParams(
        max_output_tokens=4096,
        temperature=1.0,  # Required for thinking
        thinking=ThinkingConfig(
            enabled=True,
            thinking_budget=10000,  # Can exceed max_output_tokens with interleaved
            anthropic_interleaved_thinking=True,  # Enable interleaved thinking
        ),
    ),
)

messages = [
    UserMessage(content="What's the weather in Paris, and what time is it there?")
]

# Claude will think between tool calls
response = llm(config, messages, tools=[get_weather, get_time])
```

**Key Features:**
- **Interleaved Thinking**: Claude can think between tool calls (set `anthropic_interleaved_thinking=True`)
- **Flexible Token Budget**: With interleaved thinking, `thinking_budget` can exceed `max_output_tokens`
- **Temperature Requirement**: Extended thinking requires `temperature=1.0`
- **Beta API**: Automatically uses the beta API when interleaved thinking is enabled
- **Bedrock Compatible**: Works with both Anthropic and Bedrock Anthropic providers

**When to Use:**
- Complex multi-step problems requiring planning
- Tool use scenarios with multiple steps
- Tasks that benefit from reflection between actions

See `examples/anthropic_interleaved_thinking_example.py` for comprehensive examples.

### AWS Bedrock (Anthropic Models)

Use Claude models through AWS Bedrock:

```python
config = LLMConfig(
    provider="bedrock_anthropic",
    model_id="us.anthropic.claude-sonnet-4-5-20250929-v1:0",
    params=ModelParams(
        temperature=0.0,
        max_output_tokens=1024
    )
)
```

**Supported Models**: Any Anthropic model available in AWS Bedrock
- `us.anthropic.claude-sonnet-4-5-20250929-v1:0`
- `global.anthropic.claude-sonnet-4-5-20250929-v1:0`
- And other Bedrock model IDs

**Setup**: Configure AWS credentials through environment variables or AWS CLI:
```env
AWS_ACCESS_KEY_ID=your-access-key
AWS_SECRET_ACCESS_KEY=your-secret-key
AWS_REGION=us-east-1  # or your preferred region
```

### Google Gemini

```python
config = LLMConfig(
    provider="gemini",
    model_id="gemini-2.5-flash",
    params=ModelParams(temperature=0.7)
)
```

**Supported Models**: `gemini-2.5-flash`, `gemini-2.5-pro`, `gemini-1.5-pro`, etc.

**Setup**: Set `GEMINI_API_KEY` environment variable

### Provider Feature Comparison

| Feature | OpenAI | Azure OpenAI | Anthropic | Bedrock | Gemini |
|---------|--------|--------------|-----------|---------|--------|
| Basic Chat | ✅ | ✅ | ✅ | ✅ | ✅ |
| Streaming | ✅ | ✅ | ✅ | ✅ | ✅ |
| Structured Output | ✅ | ✅ | ✅ | ✅ | ✅ |
| Function Calling | ✅ | ✅ | ✅ | ✅ | ✅ |
| Vision (Images) | ✅ | ✅ | ✅ | ✅ | ✅ |
| Prompt Caching | ❌ | ❌ | ✅ | ✅ | ❌ |
| Thinking Mode | ❌ | ❌ | ❌ | ❌ | ✅ |
| Built-in Search | ❌ | ❌ | ❌ | ❌ | ✅ |
| Code Execution | ❌ | ❌ | ❌ | ❌ | ✅ |
| Entra ID Auth | ❌ | ✅ | ❌ | ❌ | ❌ |

#### Gemini-Specific Features

**Thinking Mode** (Extended Reasoning):
```python
config = LLMConfig(
    provider="gemini",
    model_id="gemini-2.5-pro",
    params=ModelParams(temperature=0.7),
    model_args={
        "thinking_budget": 2048,      # Allocate tokens for reasoning
        "include_thoughts": True,     # Include reasoning in response
    }
)
```

**Built-in Tools**:
```python
# Google Search
response = llm(config, messages, tools=[{"google_search": {}}])

# Code Execution
response = llm(config, messages, tools=[{"code_execution": {}}])

# URL Context
response = llm(config, messages, tools=[{"url_context": {}}])
```

## API Reference

### Main Functions

#### `llm()`

Unified interface for LLM interactions.

```python
def llm(
    llm_config: LLMConfig,
    messages: list[MessageType],
    *,
    output_schema: type[OutputSchemaType] | None = None,
    tools: list[Tool | Callable | dict] | None = None,
    stream: bool = False,
) -> AIMessage | OutputSchemaType | Generator[...]:
    """
    Make an LLM request.
    
    Args:
        llm_config: Configuration for the LLM
        messages: List of conversation messages
        output_schema: Optional Pydantic model for structured output
        tools: Optional list of tools/functions
        stream: Whether to stream the response
        
    Returns:
        AIMessage, structured output, or generator
    """
```

### Configuration

#### `LLMConfig`

```python
class LLMConfig(BaseModel):
    provider: str                           # "openai", "azure_openai", "anthropic", "bedrock_anthropic", or "gemini"
    model_id: str                          # Model identifier (e.g., "gpt-4", "claude-sonnet-4")
    params: ModelParams | None = None      # Standard model parameters
    credentials: Credentials | None = None # Optional credentials (required for Azure OpenAI)
    model_args: dict[str, Any] = {}        # Provider-specific arguments
```

#### `ModelParams`

Standard parameters that work across all providers:

```python
class ModelParams(BaseModel):
    temperature: float | None = None           # Controls randomness (0.0 to 2.0)
    max_output_tokens: int | None = None       # Maximum tokens to generate
    top_p: float | None = None                 # Nucleus sampling parameter
    top_k: int | None = None                   # Top-k sampling parameter
    stop_sequences: list[str] | None = None    # Sequences where the model stops
```

### Messages

- `SystemMessage(content: str)` - System instructions
- `UserMessage(content: str | list[ContentBlock])` - User input
- `AIMessage(content: str | list[ContentBlock])` - AI response

### Content Blocks

- `InputTextBlock` - Text input from user
- `OutputTextBlock` - Text output from AI
- `InputImageBlock` - Image input
- `ReasoningContent` - Reasoning/thinking content
- `ToolCallContent` - Tool call request
- `ToolResultContent` - Tool execution result

## Examples

See the [`examples/`](examples/) directory for complete examples:

- **`azure_openai_example.py`**: Azure OpenAI with API Key and Microsoft Entra ID authentication
- **`gemini_example.py`**: Comprehensive Gemini provider examples with thinking mode
- **`bedrock_example.py`**: AWS Bedrock integration examples
- **`agent_example.py`**: Multi-turn conversations and tool usage patterns
- **`streamlit_streaming_chat.py`**: Interactive streaming chat interface
- **`streamlit_agent_chat.py`**: Web-based chat interface with debug view

### Running Examples

```bash
# Install example dependencies
pip install streamlit

# Run a specific example
python examples/bedrock_example.py

# Run the Streamlit chat interface
streamlit run examples/streamlit_streaming_chat.py
```

## Troubleshooting

### Common Issues

**Import Errors**
```python
# Ensure you've installed the provider-specific dependencies
pip install thinllm[anthropic]  # or openai, gemini, bedrock
```

**API Key Issues**
```python
# Make sure your environment variables are set
from dotenv import load_dotenv
load_dotenv()

# Or set them directly
import os
os.environ["ANTHROPIC_API_KEY"] = "your-key"
```

**AWS Bedrock Authentication**
```bash
# Configure AWS CLI (recommended)
aws configure

# Or set environment variables
export AWS_ACCESS_KEY_ID=your-key
export AWS_SECRET_ACCESS_KEY=your-secret
export AWS_REGION=us-east-1
```

**Model Not Found**
```python
# Bedrock models use region-specific IDs:
# "us.anthropic.claude-sonnet-4-5-20250929-v1:0"  # US region
# "global.anthropic.claude-sonnet-4-5-20250929-v1:0"  # Global
```

## Roadmap

The following features are planned for ThinLLM. These are **up for grabs** - contributions are welcome!

### Core API Features

- [ ] **Async API Support**: Asynchronous API for non-blocking LLM interactions

### Provider Enhancements

- [ ] **Vertex AI Support**: Integration with Google Cloud's Vertex AI platform
- [x] **Anthropic Caching**: Prompt caching support for improved performance and cost efficiency
- [x] **Anthropic Extended Thinking Support**: Enhanced reasoning capabilities with interleaved thinking
- [x] **Anthropic Support for Thinking with Structured Output**: Using beta API where applicable

### Multimodal Capabilities

- [ ] **Image Generation Support**: Native support for image generation across providers

### Tool & Function Capabilities

- [ ] **Auto Function Call Support**: Automatic execution of function calls
- [ ] **MCP Support**: Model Context Protocol integration
- [ ] **Computer Use Support**: Anthropic's computer use capabilities

### Built-in Server Tools

- [ ] **Search Tool**: Web search capabilities
- [ ] **Fetch Tool**: URL fetching and content retrieval
- [ ] **Code Executor**: Safe code execution environment

### Provider-Specific Tools

#### Anthropic Tools
- [ ] **Tool Search Tool**: Anthropic's search tool integration
- [ ] **Memory Tool**: Persistent memory capabilities
- [ ] **Text Editor Tool**: Advanced text editing
- [ ] **Bash Tool**: Command-line execution

#### Gemini Tools
- [ ] **Google Maps Integration**: Location and mapping features
- [ ] **URL Context**: Enhanced URL processing

### Response Features

- [ ] **Citation Support**: Source attribution and citation tracking
- [ ] **Raw Response in AI Message**: Access to raw API responses
- [ ] **Add stop reason to the response**: Add stop reason to the AIMessage

### Additional Capabilities

- [ ] **Embedding Support**: Text embedding generation across providers
- [ ] **Observability Platforms**: 
  - [ ] Langfuse integration
  - [ ] Langsmith integration
  - [ ] Additional observability platforms

## Contributing

Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines on:

- Setting up the development environment
- Running tests
- Code quality standards
- Common development patterns
- Submitting pull requests

## License

MIT License - see [LICENSE](LICENSE) file for details

## Acknowledgments

- Built with [Pydantic](https://pydantic.dev/) for data validation
- Uses official SDKs: `openai`, `anthropic`, `google-genai`
