Metadata-Version: 2.4
Name: airflow-watcher
Version: 0.1.4
Summary: Airflow UI Plugin for monitoring DAG failures and SLA misses
Author-email: Ramanujam Solaimalai <ramanujam.solaimalai@gmail.com>
Maintainer-email: Ramanujam Solaimalai <ramanujam.solaimalai@gmail.com>
License: Apache-2.0
Project-URL: Homepage, https://github.com/ram07eng/airflow-watcher
Project-URL: Bug Tracker, https://github.com/ram07eng/airflow-watcher/issues
Project-URL: Documentation, https://github.com/ram07eng/airflow-watcher#readme
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Framework :: Apache Airflow
Classifier: Topic :: System :: Monitoring
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: apache-airflow<3.0,>=2.7.0
Requires-Dist: flask>=2.0.0
Requires-Dist: requests>=2.28.0
Requires-Dist: slack-sdk>=3.19.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: pytest-mock>=3.10.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Provides-Extra: pagerduty
Requires-Dist: requests>=2.28.0; extra == "pagerduty"
Provides-Extra: statsd
Provides-Extra: prometheus
Requires-Dist: prometheus-client>=0.15.0; extra == "prometheus"
Provides-Extra: alerting
Requires-Dist: requests>=2.28.0; extra == "alerting"
Requires-Dist: slack-sdk>=3.19.0; extra == "alerting"
Provides-Extra: all
Requires-Dist: requests>=2.28.0; extra == "all"
Requires-Dist: slack-sdk>=3.19.0; extra == "all"
Requires-Dist: prometheus-client>=0.15.0; extra == "all"
Dynamic: license-file

# Airflow Watcher 👁️

An Airflow UI plugin for monitoring DAG failures and SLA misses/delays.

## Demo

![Airflow Watcher Demo](docs/images/demo_new.gif)

## Features

- 🚨 **DAG Failure Monitoring**: Real-time tracking of DAG and task failures
- ⏰ **SLA Miss Detection**: Alerts when DAGs miss their SLA deadlines
- 📊 **Dashboard View**: Custom Airflow UI view for monitoring status
- 🔔 **Multi-channel Notifications**: Slack, Email, and PagerDuty alerts
- 📈 **Trend Analysis**: Historical failure and SLA miss trends
- 📡 **Metrics Export**: StatsD/Datadog and Prometheus support
- ⚙️ **Flexible Alert Rules**: Pre-defined templates or custom rules

## Installation

📖 **See [INSTALL.md](INSTALL.md) for detailed installation and configuration instructions.**

## Alerting & Monitoring

📖 **See [ALERTING.md](ALERTING.md) for complete alerting configuration:**

- **Slack** - Rich notifications with blocks
- **Email** - SMTP-based alerts
- **PagerDuty** - Incident management with deduplication
- **StatsD/Datadog** - Real-time metrics
- **Prometheus** - `/metrics` endpoint for scraping

### Quick Setup

```bash
# Slack alerts
export AIRFLOW_WATCHER_SLACK_WEBHOOK_URL="https://hooks.slack.com/..."

# PagerDuty (optional)
export AIRFLOW_WATCHER_PAGERDUTY_ROUTING_KEY="your-key"

# Choose alert template
export AIRFLOW_WATCHER_ALERT_TEMPLATE="production_balanced"
```

## Usage

Once installed, the plugin will automatically:
1. Register with Airflow's plugin system
2. Add a "Watcher" menu item to the Airflow UI
3. Start monitoring DAG failures and SLA misses

### Watcher Menu

Navigate to **Watcher** in the Airflow UI navigation to access:
- **Airflow Dashboard** - Overview metrics
- **Airflow Health** - DAG health status (success/failed/delayed/stale)
- **DAG Scheduling** - Queue and pool utilization
- **DAG Failures** - Recent failures with details
- **SLA Tracker** - SLA misses and delays
- **Task Health** - Long-running and zombie tasks
- **Dependencies** - Cross-DAG dependency tracking

## Role-Based Access Control (RBAC)

Airflow Watcher integrates with Airflow's built-in FAB security manager to enforce DAG-level access control. No separate configuration is needed — it reads directly from Airflow's role and permission system.

### How It Works

- **Admin / Op roles** see all DAGs across every Watcher page and API endpoint
- **Custom roles** only see DAGs they have `can_read` permission on
- Filtering is mandatory and applied server-side — restricted users cannot bypass it
- Aggregate stats (failure counts, SLA misses, health scores) are recomputed per-user so no global data leaks
- A 🔒 badge appears in the filter bar for non-admin users

### Setting Up DAG-Level Permissions

Add `access_control` to your DAG definitions to grant team-specific access:

```python
from airflow import DAG

dag = DAG(
    dag_id="weather_data_pipeline",
    schedule_interval="@hourly",
    access_control={
        "team_weather": {"can_read", "can_edit"},
    },
)
```

Then create matching roles in Airflow (Admin → Security → List Roles) and assign users to them. The Watcher plugin will automatically pick up the permissions.

### What Gets Filtered

| Area | Filtering |
|------|-----------|
| Dashboard stats | Failure count, SLA misses, health score — all scoped to user's DAGs |
| Failures page | Only failures from accessible DAGs |
| SLA page | Only SLA misses from accessible DAGs |
| Health page | Health status, stale DAGs, scheduling lag — filtered |
| Task health | Long-running tasks, zombies, retries — filtered |
| Scheduling | Concurrent runs, delayed DAGs — filtered |
| Dependencies | Cross-DAG deps, correlations — filtered |
| All API endpoints | Same RBAC enforcement as UI pages |

### Demo Users

The demo environment includes pre-configured RBAC users:

| User | Role | Visible DAGs |
|------|------|-------------|
| `admin` | Admin | All 8 DAGs |
| `weather_user` | team_weather | weather_data_pipeline, stock_market_collector |
| `ecommerce_user` | team_ecommerce | ecommerce_sales_etl, data_quality_checks |

Passwords are configured in `demo/docker-compose.yml`. Change them before any shared deployment.

### RBAC Demo

**Admin user** — sees all DAGs and full aggregate stats:

![Admin RBAC Demo](docs/images/rbac_admin.gif)

**Weather team user** — only sees weather_data_pipeline and stock_market_collector:

![Weather User RBAC Demo](docs/images/rbac_weather.gif)

**Ecommerce team user** — only sees ecommerce_sales_etl and data_quality_checks:

![Ecommerce User RBAC Demo](docs/images/rbac_ecommerce.gif)

```bash
cd demo
docker-compose up -d
# Visit http://localhost:8080 and login as any user above
```

## Architecture

```
+--------------------------------------------------------------+
|                   Airflow Webserver                          |
|                                                              |
|  +--------------------------------------------------------+  |
|  |              Airflow Watcher Plugin                    |  |
|  |                                                        |  |
|  |  +-------------+     +------------------------------+  |  |
|  |  | Flask Views  |    |        Monitors (6)          |  |  |
|  |  | (Dashboard)  |<---|  - DAG Failure Monitor       |  |  |
|  |  |              |    |  - SLA Monitor               |  |  |
|  |  | REST API     |    |  - Task Health Monitor       |  |  |
|  |  | /api/watcher |    |  - Scheduling Monitor        |  |  |
|  |  +-------------+     |  - Dependency Monitor        |  |  |
|  |         |            |  - DAG Health Monitor        |  |  |
|  |         |            +----------+-------------------+  |  |
|  |         |                      |                       |  |
|  |         |           +----------v-------------------+   |  |
|  |         |           |    Metrics Collector          |  |  |
|  |         |           |    (WatcherMetrics)           |  |  |
|  |         |           +----------+-------------------+   |  |
|  |         |                      |                       |  |
|  |         v                      v                       |  |
|  |  +-------------+     +------------------------------+  |  |
|  |  |  Notifiers   |    |        Emitters              |  |  |
|  |  |  - Slack     |    |  - StatsD / Datadog (UDP)    |  |  |
|  |  |  - Email     |    |  - Prometheus (/metrics)     |  |  |
|  |  |  - PagerDuty |    |                              |  |  |
|  |  +-------------+     +------------------------------+  |  |
|  +--------------------------------------------------------+  |
|                          |                                   |
|                          v                                   |
|              +-----------------------+                       |
|              |  Airflow Metadata DB  |                       |
|              |  (PostgreSQL/MySQL)   |                       |
|              +-----------------------+                       |
+--------------------------------------------------------------+
```

Everything runs inside the Airflow webserver process. No separate workers, no message queues, no external databases. The plugin reads from the same metadata DB that Airflow already maintains.

## Project Structure

```
airflow-watcher/
├── src/
│   └── airflow_watcher/
│       ├── __init__.py
│       ├── plugins/           # Airflow plugin definitions
│       ├── views/             # Flask Blueprint views
│       ├── monitors/          # DAG & SLA monitoring logic
│       ├── notifiers/         # Slack, email notifications
│       └── templates/         # Jinja2 templates
├── demo/                      # Local demo Airflow environment
│   ├── dags/                  # Sample DAGs for testing
│   ├── plugins/               # Plugin copy for demo
│   └── docker-compose.yml     # Docker setup
├── tests/
└── pyproject.toml
```

## Demo Environment

To test the plugin locally with sample DAGs:

```bash
cd demo
docker-compose up -d
```

Then visit http://localhost:8080 and navigate to the **Watcher** menu.

See [demo/README.md](demo/README.md) for more details.

<details>
<summary><h2>MWAA Integration</h2></summary>

### Setup

1. Add `airflow-watcher` to your MWAA `requirements.txt`:

```
airflow-watcher==0.1.2
```

For Prometheus metrics support:
```
airflow-watcher[all]==0.1.2
```

2. Upload `requirements.txt` to your MWAA S3 bucket:

```bash
aws s3 cp requirements.txt s3://<your-mwaa-bucket>/requirements.txt
```

3. Update your MWAA environment to pick up the new requirements (via AWS Console or CLI):

```bash
aws mwaa update-environment \
  --name <your-environment-name> \
  --requirements-s3-path requirements.txt \
  --requirements-s3-object-version <version-id>
```

> **Note:** No `plugins.zip` is needed. Airflow auto-discovers airflow-watcher via the `airflow.plugins` entry point when installed via pip (Airflow 2.7+).

4. Wait for the environment to finish updating (takes a few minutes).

5. Verify at:
```
https://<your-mwaa-url>/api/watcher/health
```

### Environment Variables (optional)

Configure via MWAA Airflow configuration overrides:

| Variable | Purpose |
|---|---|
| `AIRFLOW_WATCHER__SLACK_WEBHOOK_URL` | Slack notifications |
| `AIRFLOW_WATCHER__PAGERDUTY_API_KEY` | PagerDuty alerts |
| `AIRFLOW_WATCHER__ENABLE_PROMETHEUS` | Prometheus metrics |

### Testing Locally with MWAA Local Runner

```bash
git clone https://github.com/aws/aws-mwaa-local-runner.git
cd aws-mwaa-local-runner
echo "airflow-watcher==0.1.2" >> requirements/requirements.txt
./mwaa-local-env build-image
./mwaa-local-env start
```

Visit `http://localhost:8080/api/watcher/health` to verify.

> **Note:** If using Slack or PagerDuty notifications, ensure your MWAA VPC has a NAT gateway for outbound internet access.

</details>

## Development

```bash
# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run linting
ruff check src tests
black --check src tests

# Type checking
mypy src
```

## License

Apache License 2.0 - See [LICENSE](LICENSE) for details.

## Author

**Ramanujam Solaimalai** ([@ram07eng](https://github.com/ram07eng))

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
