Metadata-Version: 2.4
Name: rl-autoscale
Version: 1.0.1
Summary: Production-ready metrics instrumentation for RL-based autoscaling systems
Author-email: Fauzan Ghaza <contact@fauzanghaza.com>
Project-URL: Homepage, https://github.com/ghazafm/rl-autoscale
Project-URL: Documentation, https://github.com/ghazafm/rl-autoscale#readme
Project-URL: Repository, https://github.com/ghazafm/rl-autoscale
Project-URL: Issues, https://github.com/ghazafm/rl-autoscale/issues
Project-URL: Changelog, https://github.com/ghazafm/rl-autoscale/blob/main/CHANGELOG.md
Keywords: reinforcement-learning,autoscaling,kubernetes,prometheus,observability,metrics,rl,dqn,q-learning
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: System :: Monitoring
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: prometheus-client>=0.19.0
Provides-Extra: flask
Requires-Dist: flask>=2.0.0; extra == "flask"
Provides-Extra: fastapi
Requires-Dist: fastapi>=0.100.0; extra == "fastapi"
Requires-Dist: starlette>=0.27.0; extra == "fastapi"
Requires-Dist: uvicorn>=0.38.0; extra == "fastapi"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: ruff>=0.8.0; extra == "dev"
Requires-Dist: build>=1.0.0; extra == "dev"
Requires-Dist: twine>=5.0.0; extra == "dev"
Dynamic: license-file

# rl-autoscale

🎯 **Production-ready metrics instrumentation for RL-based autoscaling systems**

A lightweight Python library that provides standardized Prometheus metrics for applications managed by reinforcement learning autoscalers. Works with Flask, FastAPI, and other Python web frameworks.

[![PyPI version](https://badge.fury.io/py/rl-autoscale.svg)](https://badge.fury.io/py/rl-autoscale)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

## Features

✅ **One-Line Integration** - Add 2 lines of code, get full observability
✅ **Framework Agnostic** - Flask, FastAPI, Django support
✅ **Zero Overhead** - Minimal performance impact (<1ms per request)
✅ **Production Ready** - Battle-tested with proper error handling
✅ **RL Optimized** - Metrics designed specifically for RL autoscaling agents
✅ **Standardized** - Consistent metrics across all your microservices

## Quick Start

### Flask Application

```python
from flask import Flask
from rl_autoscale import enable_metrics

app = Flask(__name__)

# 🎯 ONE LINE - that's it!
enable_metrics(app, port=8000)

@app.route("/api/hello")
def hello():
    return "Hello World"

if __name__ == "__main__":
    app.run()
```

### FastAPI Application

```python
from fastapi import FastAPI
from rl_autoscale import enable_metrics

app = FastAPI()

# 🎯 ONE LINE - that's it!
enable_metrics(app, port=8000)

@app.get("/api/hello")
async def hello():
    return {"message": "Hello World"}
```

## Installation

```bash
# Using pip
pip install rl-autoscale[flask]      # For Flask apps
pip install rl-autoscale[fastapi]    # For FastAPI apps

# Using uv (recommended - 10x faster!)
uv pip install rl-autoscale[flask]
uv pip install rl-autoscale[fastapi]

# Core only
pip install rl-autoscale
```

> 💡 **Tip**: This project uses [uv](https://github.com/astral-sh/uv) for blazingly fast package management. See [UV_GUIDE.md](UV_GUIDE.md) for details.

## What Metrics Are Exposed?

The library exports these standard Prometheus metrics:

### 1. `http_request_duration_seconds` (Histogram)
Request latency distribution used by RL agents to calculate percentiles (p50, p90, p99).

**Labels:**
- `method`: HTTP method (GET, POST, etc.)
- `path`: Request path (e.g., `/api/users`)

**Buckets:** Optimized for web APIs (5ms to 10s)

### 2. `http_requests_total` (Counter)
Total request count used for throughput analysis.

**Labels:**
- `method`: HTTP method
- `path`: Request path
- `http_status`: HTTP status code (200, 404, etc.)

## Advanced Usage

### Path Normalization

Prevent cardinality explosion by normalizing dynamic paths:

```python
from rl_autoscale import enable_metrics
from rl_autoscale.flask_middleware import normalize_api_paths

app = Flask(__name__)

enable_metrics(
    app,
    port=8000,
    path_normalizer=normalize_api_paths  # /user/123 -> /user/:id
)
```

### Custom Configuration

```python
enable_metrics(
    app,
    port=8000,
    namespace="myapp",  # Prefix metrics: myapp_http_request_duration_seconds
    histogram_buckets=[0.001, 0.01, 0.1, 1.0, 10.0],  # Custom buckets
    exclude_paths=["/health", "/metrics", "/internal/*"],  # Skip these paths
)
```

### Manual Instrumentation

For non-web workloads or custom metrics:

```python
from rl_autoscale import get_metrics_registry

metrics = get_metrics_registry()

# Record a custom operation
with timer():
    result = expensive_operation()
    duration = timer.elapsed()

metrics.observe_request(
    method="BATCH",
    path="/jobs/process",
    duration=duration,
    status_code=200
)
```

## Architecture

```
┌─────────────────┐
│   Your App      │ ← Clean business logic (no metrics code!)
│   (Flask/       │
│    FastAPI)     │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  RL Metrics     │ ← This library (automatic instrumentation)
│  Middleware     │
└────────┬────────┘
         │ HTTP :8000/metrics
         ▼
┌─────────────────┐
│  Prometheus     │ ← Scrapes metrics every 15-60s
│                 │
└────────┬────────┘
         │ PromQL queries
         ▼
┌─────────────────┐
│   RL Agent      │ ← Makes autoscaling decisions
│ (DQN/Q-Learning)│
└─────────────────┘
```

## Why This Library?

### Before (Without Library) ❌
```python
from prometheus_client import Counter, Histogram, start_http_server

# 50+ lines of boilerplate in EVERY service
REQUEST_LATENCY = Histogram(...)
REQUEST_COUNT = Counter(...)

@app.before_request
def before_request():
    request.start_time = time.time()

@app.after_request
def after_request(response):
    latency = time.time() - request.start_time
    REQUEST_LATENCY.labels(...).observe(latency)
    REQUEST_COUNT.labels(...).inc()
    return response

start_http_server(8000)
# ... more boilerplate
```

### After (With Library) ✅
```python
from rl_autoscale import enable_metrics

enable_metrics(app, port=8000)  # Done! 🎉
```

**Benefits:**
- ✅ **60+ lines → 1 line** - Massive code reduction
- ✅ **Standardized** - Same metrics across all services
- ✅ **Maintainable** - Update once, affects all services
- ✅ **Tested** - Production-ready error handling
- ✅ **Reusable** - Use in Flask, FastAPI, Django

## Configuration via Environment Variables

```bash
# Metrics port
export RL_METRICS_PORT=8000

# Custom namespace
export RL_METRICS_NAMESPACE=myapp

# Excluded paths (comma-separated)
export RL_METRICS_EXCLUDE_PATHS=/health,/metrics,/internal/*
```

## Kubernetes Deployment

Add Prometheus scrape annotations to your deployment:

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    metadata:
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8000"
        prometheus.io/path: "/metrics"
    spec:
      containers:
      - name: app
        image: myapp:latest
        ports:
        - containerPort: 5000  # App port
        - containerPort: 8000  # Metrics port
```

## Prometheus Queries

Example queries for your RL agent:

```promql
# P90 response time over last 5 minutes
histogram_quantile(0.90,
  rate(http_request_duration_seconds_bucket[5m])
)

# Requests per second
rate(http_requests_total[1m])

# Error rate (HTTP 5xx)
rate(http_requests_total{http_status=~"5.."}[5m])
```

## Performance

- **Overhead:** <1ms per request
- **Memory:** ~10MB baseline
- **CPU:** Negligible (<0.1% on most workloads)

Tested with 10,000 requests/second with zero performance degradation.

## Troubleshooting

### Port Already in Use

```python
# If port 8000 is taken, use another port
enable_metrics(app, port=8001)
```

### Metrics Not Showing Up

1. Check metrics endpoint: `curl http://localhost:8000/metrics`
2. Verify Prometheus can reach your app
3. Check Prometheus scrape config
4. Look for errors in application logs

### High Cardinality Warning

If you see thousands of unique metric labels:

```python
# Add path normalization
from rl_autoscale.flask_middleware import normalize_api_paths

enable_metrics(app, path_normalizer=normalize_api_paths)
```

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup and guidelines.

```bash
# Clone repository
git clone https://github.com/ghazafm/rl-autoscale
cd rl-autoscale

# Setup with uv (recommended - one command!)
uv sync --all-extras

# Or traditional way with pip
pip install -e ".[dev,flask,fastapi]"

# Run tests
pytest

# Format and lint code (using ruff for both!)
ruff format .
ruff check .
```

> 📖 **Quick Start**: See [QUICK_START.md](QUICK_START.md) for fastest setup!
> 📖 **Developer Guide**: See [UV_GUIDE.md](UV_GUIDE.md) for using uv and ruff effectively.

## License

MIT License - See [LICENSE](LICENSE) file

## Support

- 📖 **Documentation:** https://github.com/ghazafm/rl-autoscale
- 🐛 **Issues:** https://github.com/ghazafm/rl-autoscale/issues
- 💬 **Discussions:** https://github.com/ghazafm/rl-autoscale/discussions
- 📦 **PyPI:** https://pypi.org/project/rl-autoscale/

---

**Made with ❤️ for RL Autoscaling Systems**
