Metadata-Version: 2.2
Name: flowa-core
Version: 0.1.0
Summary: Flowa — lightweight pipeline orchestration
Author: gustavosegre
License: MIT
Project-URL: Homepage, https://github.com/gustavosegre/flowa
Keywords: pipeline,orchestration,workflow,scheduler,airflow
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: typer
Requires-Dist: pyyaml
Requires-Dist: apscheduler
Requires-Dist: fastapi
Requires-Dist: uvicorn[standard]
Requires-Dist: httpx

# flowa

A lightweight pipeline orchestration tool inspired by Apache Airflow.
Define workflows in YAML, run them from the CLI, schedule them with cron, and monitor everything through a REST API and web UI.

---

## Features

- **DAG-based execution** — topological sort with cycle detection
- **Parallel steps** — independent steps run concurrently via thread pool
- **Retry & timeout** — configurable per step
- **Continue on error** — mark steps as non-blocking for their dependents
- **Cron scheduling** — schedule pipelines by day/time/interval
- **SQLite history** — every run and step is recorded automatically
- **REST API** — trigger pipelines and query history programmatically
- **Web UI** — built-in dashboard to manage and monitor pipelines

---

## Installation

```bash
pip install flowa
```

**Requirements:** Python 3.11+

---

## Quick Start

**1. Create a pipeline**

```yaml
# pipelines/etl.yaml
name: etl

steps:
  - name: extract
    run: python scripts/extract.py
    retries: 3
    timeout_seconds: 120

  - name: transform
    run: python scripts/transform.py
    depends_on: extract

  - name: load
    run: python scripts/load.py
    depends_on: transform

  - name: notify
    run: python scripts/notify.py
    depends_on: transform
    continue_on_error: true
```

**2. Run it**

```bash
flowa run pipelines/etl.yaml
```

**3. Open the dashboard**

```bash
flowa serve
# → http://127.0.0.1:8000
```

---

## Pipeline YAML Reference

```yaml
name: my_pipeline          # required
max_parallel: 4            # max concurrent steps (default: 4)

schedule:                  # optional
  days: All Days           # All Days | Mon,Tue,Wed,Thu,Fri | ["Mon", "Fri"]
  start: "09:00"
  end:   "18:00"
  interval_minutes: 60
  timezone: UTC

steps:
  - name: step_name        # required, must be unique
    run: command to run    # required, executed in shell
    depends_on: other_step # optional — string or list
    retries: 0             # optional, default 0
    timeout_seconds: 60    # optional, no limit by default
    continue_on_error: false  # optional, default false
```

### `depends_on`

Accepts a single step name or a list:

```yaml
depends_on: extract
# or
depends_on: [extract, validate]
```

### Step status values

| Status | Description |
|---|---|
| `SUCCESS` | Step completed with exit code 0 |
| `FAILED` | Step failed after all retries |
| `FAILED (ignored)` | Step failed but `continue_on_error: true` |
| `SKIPPED` | Step skipped because a dependency hard-failed |

---

## CLI Reference

```bash
flowa run <pipeline.yaml>       # run a pipeline manually
flowa start                     # start the scheduler
flowa serve                     # start the API + web UI
flowa history                   # show recent runs
flowa history <pipeline_name>   # filter by pipeline
flowa logs <run_id>             # show steps for a run
```

### `flowa serve` options

```bash
flowa serve --host 0.0.0.0 --port 8080 --reload
```

---

## REST API

| Method | Endpoint | Description |
|---|---|---|
| `GET` | `/pipelines` | List available pipelines |
| `POST` | `/pipelines/{name}/run` | Trigger a pipeline (async) |
| `GET` | `/runs` | Execution history |
| `GET` | `/runs/{id}` | Run detail with steps |
| `GET` | `/runs/{id}/steps/{step}/logs` | Step log content |
| `GET` | `/health` | Health check |

Interactive docs available at `http://localhost:8000/docs`.

**Trigger a pipeline:**

```bash
curl -X POST http://localhost:8000/pipelines/etl/run
# {"run_id": 42, "status": "RUNNING", ...}
```

**Check run status:**

```bash
curl http://localhost:8000/runs/42
```

---

## Configuration

All settings are controlled via environment variables:

| Variable | Default | Description |
|---|---|---|
| `FLOWA_PIPELINES_DIR` | `pipelines` | Directory scanned by the scheduler |
| `FLOWA_LOGS_DIR` | `logs` | Where step log files are written |
| `FLOWA_DB_PATH` | `flowa.db` | SQLite database file path |
| `FLOWA_LOG_LEVEL` | `INFO` | Log level (`DEBUG`, `INFO`, `WARNING`, `ERROR`) |

---

## Project Layout

A typical project using flowa:

```
my-project/
├── pipelines/
│   ├── etl.yaml
│   └── reporting.yaml
├── scripts/
│   ├── extract.py
│   ├── transform.py
│   └── load.py
├── logs/           ← created automatically
└── flowa.db        ← created automatically
```

---

## License

MIT
