Metadata-Version: 2.4
Name: aiops-sdk
Version: 0.1.6
Summary: AIOps Platform SDK — exception capture, heartbeat, and Flask integration
License: MIT
Project-URL: Homepage, https://pypi.org/project/aiops-sdk/
Project-URL: Source, https://github.com/arnav1/aiops-sdk
Keywords: aiops,monitoring,exception,observability,devops
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Monitoring
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: requests>=2.28.0
Provides-Extra: flask
Requires-Dist: Flask>=2.0.0; extra == "flask"

# AIOps SDK

Python SDK for the [AIOps Platform](https://aiops.1genimpact.cloud) — automatic exception capture, heartbeat monitoring, and Flask integration.

## Installation

```bash
pip install aiops-sdk             # core only
pip install "aiops-sdk[flask]"    # with Flask integration
```

## Quick Start

```python
import aiops_sdk
from aiops_sdk.integrations.flask import init_flask

# Initialise once at startup (reads AIOPS_API_KEY from env if not passed)
aiops_sdk.init(api_key="your-api-key")

# Register Flask integration (call after your Flask app is created)
init_flask(app)
```

That's it. The SDK will automatically:
- Report unhandled exceptions before the process exits
- Capture every HTTP 4xx/5xx error response (extracts the real error from your JSON response body)
- Forward ERROR and CRITICAL log records as incidents
- Send heartbeats every 30 seconds so the platform knows the service is alive

## Environment Variables

| Variable | Required | Default | Description |
|---|---|---|---|
| `AIOPS_API_KEY` | **Yes** | — | Platform API key |
| `AIOPS_SERVICE_NAME` | No | `unknown-service` | Service name shown in incidents |
| `AIOPS_ENV` | No | `production` | Environment (`production` / `staging` / etc.) |
| `AIOPS_PLATFORM_URL` | No | `https://aiops.1genimpact.cloud` | Override if self-hosting the platform |
| `AIOPS_SERVICE_BASE_URL` | No | auto-detected | This service's own base URL (used for automated fix callbacks) |
| `AIOPS_SKIP_PATHS` | No | — | Extra paths to exclude from monitoring (comma-separated, e.g. `/admin/health,/internal/ping`) |

## What Gets Captured

| Source | Mechanism | Notes |
|---|---|---|
| Unhandled exceptions | `sys.excepthook` replacement | Blocks until delivered (sync send) |
| Flask exceptions | `got_request_exception` signal | Full stack trace preserved |
| HTTP 4xx/5xx responses | `after_request` hook | Extracts real error from `jsonify` body |
| ERROR / CRITICAL logs | Root logging handler | Stack trace from `exc_info` when available |

## What's Excluded (Built-in Noise Filtering)

The SDK is designed for zero false positives in production:

- **`OPTIONS` and `HEAD` requests** — CORS preflights and HTTP HEAD probes are never application errors
- **Health-check paths** — `/health`, `/healthz`, `/ping`, `/ready`, `/alive`, `/readiness`, `/liveness`, `/metrics`, `/status`, `/favicon.ico`, `/robots.txt`
- **Static files** — `.js`, `.css`, `.ico`, `.png`, `.jpg`, `.svg`, `.woff`, `.woff2`, `.ttf`, `.map`, and more
- **HTTP 404** — "path not found" is expected client behaviour, not a server bug
- **HTTP 429** — rate-limiting is working as intended, not an error state
- **Werkzeug access log noise** — bot/scanner traffic logged by the HTTP server layer
- **`urllib3` / `requests` library errors** — connection pool noise when the platform itself is temporarily unreachable
- **SDK's own logs** — prevents feedback loops if the background worker logs a failure

### Adding Custom Exclusions

```bash
# Exclude additional paths without code changes:
export AIOPS_SKIP_PATHS=/admin/health,/internal/ping,/ops/ready
```

## How It Works

```
Flask request
    │
    ├─► got_request_exception  ──► capture_exception()  ──► send_async() ──► queue
    │         (real exceptions, full stack trace)
    │
    └─► after_request hook
            │
            ├─ [noise filter] OPTIONS / HEAD → skip
            ├─ [noise filter] /health → skip
            ├─ [noise filter] .js/.css → skip
            ├─ [dedup guard] already captured by signal → skip
            │
            └─ status in {400,401,403,500,502,503,504}
                   │
                   ├─ extract real error from jsonify({message/error/detail})
                   ├─ infer error type (AttributeError, KeyError, DatabaseError, …)
                   └─► send_async() ──► queue

Background worker (daemon thread)
    └─ drains queue → POST /v1/sdk/exception (retry 3×, backoff 1s/2s)
```

## License

MIT
