# DEVELOPER GUIDE for models directory

## Quick Summary
This directory contains concrete implementations of the `BaseLlm` interface, providing wrappers for various Large Language Model APIs. These classes translate the ADK's standard `LlmRequest` into provider-specific formats and parse responses back into standard `LlmResponse` objects.

## Files Overview
- `lite_llm.py` - LLM client using the `litellm` library to support hundreds of models from different providers
- `models_llm.txt` - Documentation file containing developer guide content

## Developer API Reference

### lite_llm.py
**Purpose:** Provides the `LiteLlm` class, a `BaseLlm` implementation that interfaces with hundreds of LLM models through the `litellm` library. Supports models from OpenAI, Anthropic, Vertex AI, and many other providers by simply changing the model string.

**Import:** `from solace_agent_mesh.agent.adk.models.lite_llm import LiteLlm`

**Classes:**
- `LiteLlm(model: str, **kwargs)` - Wrapper around `litellm` supporting any model it recognizes
  - `generate_content_async(llm_request: LlmRequest, stream: bool = False) -> AsyncGenerator[LlmResponse, None]` - Generates content asynchronously with optional streaming
  - `supported_models() -> list[str]` - Returns list of supported models (empty for LiteLlm due to dynamic model support)
  - `model: str` - The name of the LiteLlm model
  - `llm_client: LiteLLMClient` - The LLM client instance used for API calls

- `LiteLLMClient()` - Internal client providing completion methods for better testability
  - `acompletion(model, messages, tools, **kwargs) -> Union[ModelResponse, CustomStreamWrapper]` - Asynchronous completion call
  - `completion(model, messages, tools, stream=False, **kwargs) -> Union[ModelResponse, CustomStreamWrapper]` - Synchronous completion call

- `FunctionChunk(BaseModel)` - Represents a function call chunk in streaming responses
  - `id: Optional[str]` - Function call ID
  - `name: Optional[str]` - Function name
  - `args: Optional[str]` - Function arguments as JSON string
  - `index: Optional[int]` - Index of the function call

- `TextChunk(BaseModel)` - Represents a text chunk in streaming responses
  - `text: str` - The text content

- `UsageMetadataChunk(BaseModel)` - Represents token usage information
  - `prompt_tokens: int` - Number of tokens in the prompt
  - `completion_tokens: int` - Number of tokens in the completion
  - `total_tokens: int` - Total number of tokens used

**Functions:**
- `_content_to_message_param(content: types.Content) -> Union[Message, list[Message]]` - Converts ADK Content to litellm Message format
- `_get_content(parts: Iterable[types.Part]) -> Union[OpenAIMessageContent, str]` - Converts parts to litellm content format
- `_function_declaration_to_tool_param(function_declaration: types.FunctionDeclaration) -> dict` - Converts function declarations to OpenAPI spec format
- `_model_response_to_generate_content_response(response: ModelResponse) -> LlmResponse` - Converts litellm response to LlmResponse

**Usage Examples:**
```python
import asyncio
import os
from solace_agent_mesh.agent.adk.models.lite_llm import LiteLlm
from solace_agent_mesh.agent.adk.models.llm_request import LlmRequest, LlmConfig
from google.genai.types import Content, Part

# Set environment variables for your chosen provider
# For OpenAI:
# os.environ["OPENAI_API_KEY"] = "your-api-key"
# For Vertex AI:
# os.environ["VERTEXAI_PROJECT"] = "your-project-id"
# os.environ["VERTEXAI_LOCATION"] = "your-location"

async def main():
    # Initialize LiteLlm with a specific model
    llm = LiteLlm(
        model="gpt-4-turbo",
        temperature=0.7,
        max_completion_tokens=150
    )
    
    # Create a request
    request = LlmRequest(
        contents=[
            Content(
                role="user",
                parts=[Part.from_text("Explain quantum computing in simple terms")]
            )
        ],
        config=LlmConfig(
            temperature=0.5,
            max_output_tokens=200
        )
    )
    
    # Non-streaming generation
    print("=== Non-streaming ===")
    async for response in llm.generate_content_async(request, stream=False):
        print(f"Response: {response.text}")
        if response.usage_metadata:
            print(f"Tokens used: {response.usage_metadata.total_token_count}")
    
    # Streaming generation
    print("\n=== Streaming ===")
    async for response in llm.generate_content_async(request, stream=True):
        if response.text:
            print(response.text, end="", flush=True)
        if response.usage_metadata:
            print(f"\nTotal tokens: {response.usage_metadata.total_token_count}")

# Example with function calling
async def function_calling_example():
    from google.genai.types import FunctionDeclaration, Schema, Type, Tool
    
    # Define a function for the LLM to call
    get_weather_func = FunctionDeclaration(
        name="get_weather",
        description="Get current weather for a location",
        parameters=Schema(
            type=Type.OBJECT,
            properties={
                "location": Schema(type=Type.STRING, description="City name"),
                "unit": Schema(type=Type.STRING, description="Temperature unit")
            },
            required=["location"]
        )
    )
    
    llm = LiteLlm(model="gpt-4-turbo")
    
    request = LlmRequest(
        contents=[
            Content(
                role="user", 
                parts=[Part.from_text("What's the weather like in Tokyo?")]
            )
        ],
        config=LlmConfig(
            tools=[Tool(function_declarations=[get_weather_func])]
        )
    )
    
    async for response in llm.generate_content_async(request):
        if response.function_calls:
            for func_call in response.function_calls:
                print(f"Function called: {func_call.name}")
                print(f"Arguments: {func_call.args}")

if __name__ == "__main__":
    asyncio.run(main())
    # asyncio.run(function_calling_example())
```

# content_hash: 12789ad2e16cd9ea5a81abdd68258d9ef30520bed5c51ba8d00ea66014191964
