Metadata-Version: 2.4
Name: burt-logger
Version: 0.1.0
Summary: A lightweight Python package for collecting LLM traces
Home-page: https://github.com/trainburt/burt-logger-python
Author: Burt Team
Author-email: Bobby Zhong <bobby@trainburt.com>
License: MIT
Project-URL: Homepage, https://github.com/trainburt/burt-logger-python.git
Project-URL: Documentation, https://github.com/trainburt/burt-logger-python.git#readme
Project-URL: Repository, https://github.com/trainburt/burt-logger-python.git
Project-URL: Bug Reports, https://github.com/trainburt/burt-logger-python.git/issues
Keywords: llm,logging,training-data,machine-learning,ai
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Logging
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.25.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: pytest-mock>=3.10.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: flake8>=5.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: isort>=5.12.0; extra == "dev"
Requires-Dist: types-requests>=2.28.0; extra == "dev"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# Burt Logger

A lightweight, production-ready Python package for collecting LLM training data. Automatically pipe LLM request/response data to your backend for model fine-tuning and dataset creation.

## Features

✨ **Non-blocking & Asynchronous** - Uses background threads and queues to ensure zero impact on your application performance

🔄 **Intelligent Batching** - Automatically batches logs by size or time interval for optimal network efficiency

🛡️ **Production-Ready** - Thread-safe, graceful error handling, and automatic retry with exponential backoff

🚀 **Minimal Dependencies** - Only requires `requests` library, everything else from Python stdlib

⚙️ **Highly Configurable** - Customize batch sizes, flush intervals, queue sizes, retry logic, and more

🔌 **Provider Agnostic** - Works with OpenAI, Anthropic, or any LLM provider

## Installation

```bash
pip install burt-logger
```

Or install from source:

```bash
git clone https://github.com/trainburt/burt-logger-python.git
cd burt-logger-python
pip install -e .
```

## Quick Start

```python
from burt_logger import LLMLogger

# Initialize the logger
logger = LLMLogger(
    endpoint="https://your-api.com/logs",
    api_key="your-api-key"
)

# Log your LLM requests and responses
response = openai.ChatCompletion.create(...)  # Your existing LLM call

logger.log(
    request={
        "model": "gpt-3.5-turbo",
        "messages": [...],
    },
    response={
        "content": response.choices[0].message.content,
        "usage": {
            "prompt_tokens": usage.get("prompt_tokens", 0),
            "completion_tokens": usage.get("completion_tokens", 0),
            "total_tokens": usage.get("total_tokens", 0),
        },
    }
)

# Gracefully shutdown (flushes remaining logs)
logger.shutdown()
```

That's it! The logger handles everything asynchronously in the background.

### Using Context Manager

The logger supports context managers for automatic cleanup:

```python
with LLMLogger(endpoint="...", api_key="...") as logger:
    # Your code here
    logger.log(request=..., response=...)
    # Automatic shutdown and flush on exit
```

## Configuration

The `LLMLogger` class accepts the following parameters:

| Parameter             | Type  | Default      | Description                                      |
| --------------------- | ----- | ------------ | ------------------------------------------------ |
| `endpoint`            | str   | **Required** | Backend API endpoint to send logs to             |
| `api_key`             | str   | **Required** | API key for authentication                       |
| `batch_size`          | int   | 10           | Number of logs to batch before sending           |
| `flush_interval`      | float | 5.0          | Seconds to wait before flushing incomplete batch |
| `max_queue_size`      | int   | 10000        | Maximum number of logs to queue                  |
| `max_retries`         | int   | 3            | Maximum number of retry attempts                 |
| `initial_retry_delay` | float | 1.0          | Initial delay for exponential backoff (seconds)  |
| `max_retry_delay`     | float | 60.0         | Maximum retry delay (seconds)                    |
| `timeout`             | float | 10.0         | HTTP request timeout (seconds)                   |
| `debug`               | bool  | False        | Enable debug logging                             |

### Example with Custom Configuration

```python
logger = LLMLogger(
    endpoint="https://your-api.com/logs",
    api_key="your-api-key",
    batch_size=20,           # Send in batches of 20
    flush_interval=10.0,     # Or every 10 seconds
    max_queue_size=50000,    # Large queue for high-volume apps
    max_retries=5,           # More retries for flaky networks
    debug=True,              # See what's happening
)
```

## API Reference

### `log(request, response, metadata=None)`

Log an LLM request/response pair.

**Parameters:**

-  `request` (dict): The LLM request data (prompt, model, parameters, etc.)
-  `response` (dict): The LLM response data (completion, tokens, etc.)
-  `metadata` (dict, optional): Additional metadata (user_id, session_id, etc.)

**Returns:**

-  `bool`: `True` if log was queued successfully, `False` if queue is full

**Example:**

```python
success = logger.log(
    request={"model": "gpt-4", "prompt": "..."},
    response={"completion": "...", "tokens": 150},
    metadata={"user_id": "123", "environment": "production"}
)
```

### `flush(timeout=None)`

Flush all queued logs and wait for them to be sent.

**Parameters:**

-  `timeout` (float, optional): Maximum time to wait in seconds. `None` means wait indefinitely.

**Example:**

```python
logger.flush(timeout=5.0)  # Wait up to 5 seconds
```

### `shutdown(timeout=10.0)`

Gracefully shutdown the logger, flushing all remaining logs.

**Parameters:**

-  `timeout` (float): Maximum time to wait for shutdown in seconds

**Example:**

```python
logger.shutdown(timeout=10.0)
```

### `get_stats()`

Get statistics about logger performance.

**Returns:**

-  `dict`: Dictionary containing statistics

**Example:**

```python
stats = logger.get_stats()
print(stats)
# {
#     'logs_queued': 150,
#     'logs_sent': 145,
#     'logs_failed': 5,
#     'batches_sent': 15,
#     'batches_failed': 1
# }
```

## How It Works

1. **Queueing**: When you call `log()`, the entry is immediately added to a thread-safe queue and the method returns instantly (non-blocking)

2. **Batching**: A background worker thread monitors the queue and batches logs based on:

   -  Batch size (e.g., 10 logs)
   -  Time interval (e.g., every 5 seconds)

3. **Sending**: Batches are sent to your backend API via HTTP POST with proper authentication headers

4. **Retry Logic**: If sending fails:

   -  **5xx errors**: Retries with exponential backoff
   -  **429 (rate limit)**: Retries with exponential backoff
   -  **4xx errors**: No retry (client error)
   -  **Network errors**: Retries with exponential backoff

5. **Shutdown**: On program exit or explicit shutdown, all remaining logs are flushed

## Backend API Expected Format

Your backend should expect POST requests with the following format:

**Headers:**

```
Content-Type: application/json
Authorization: Bearer <api_key>
```

**Payload:**

```json
{
  "logs": [
    {
      "request": { /* your request data */ },
      "response": { /* your response data */ },
      "metadata": { /* optional metadata */ },
      "timestamp": 1234567890.123
    },
    ...
  ],
  "timestamp": 1234567890.456
}
```

**Expected Response:**

-  Success: HTTP 200, 201, or 202
-  Server Error: HTTP 5xx (will retry)
-  Client Error: HTTP 4xx (will not retry)
-  Rate Limited: HTTP 429 (will retry)

## Error Handling

The logger is designed to be resilient and never crash your application:

-  **Queue Full**: If the queue is full, `log()` returns `False` and the log is dropped
-  **Network Errors**: Automatic retry with exponential backoff
-  **Backend Down**: Retries up to `max_retries` times, then drops the batch
-  **Thread Crashes**: The worker thread is monitored and restarted if needed

All errors are logged to Python's logging system. Enable debug mode to see detailed logs:

```python
logger = LLMLogger(..., debug=True)
```

## Testing

Run the test suite:

```bash
# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# With coverage
pytest tests/ --cov=burt_logger --cov-report=html
```

## Development

```bash
# Clone the repository
git clone https://github.com/trainburt/burt-logger-python.git
cd burt-logger-python

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black burt_logger/ tests/

# Lint
flake8 burt_logger/ tests/
```

## Performance Considerations

-  **Non-blocking**: `log()` calls take ~0.001ms (just queue insertion)
-  **Memory**: Each log entry is ~1-5KB. Default max queue size is 10,000 logs = ~10-50MB
-  **Network**: Batching reduces network overhead. 1000 logs/second = 100 batches (batch_size=10)
-  **Threads**: Uses a single background worker thread

## Production Recommendations

1. **Set appropriate batch_size**: Larger batches are more efficient but increase memory usage

   ```python
   logger = LLMLogger(..., batch_size=50)  # For high-volume apps
   ```

2. **Monitor queue size**: If logs are being dropped, increase `max_queue_size` or reduce traffic

   ```python
   stats = logger.get_stats()
   if stats['logs_failed'] > 0:
       # Handle appropriately
   ```

3. **Use metadata**: Add user_id, session_id, etc. for better data analysis

   ```python
   logger.log(..., metadata={"user_id": user_id, "env": "prod"})
   ```

4. **Graceful shutdown**: Always call `shutdown()` or use context manager
   ```python
   import atexit
   atexit.register(logger.shutdown)
   ```

## License

MIT License - see LICENSE file for details

## Support

-  **Issues**: https://github.com/yourusername/burt-logger-python/issues
-  **Email**: support@burt.ai

## Contributing

Contributions are welcome! Please:

1. Fork the repository
2. Create a feature branch
3. Add tests for new functionality
4. Ensure all tests pass
5. Submit a pull request

## Changelog

### 0.1.0 (Initial Release)

-  ✅ Non-blocking asynchronous logging
-  ✅ Intelligent batching (by size and time)
-  ✅ Thread-safe operations
-  ✅ Retry with exponential backoff
-  ✅ Graceful shutdown and cleanup
-  ✅ Comprehensive test suite
-  ✅ Context manager support
-  ✅ Statistics tracking

---

Built with ❤️ for the LLM training data collection community
