Metadata-Version: 2.4
Name: cf-ontology
Version: 0.2.2
Summary: CogniFlow ontology definitions and utilities
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: rdflib<8.0,>=7.4
Requires-Dist: pyyaml<7.0,>=6.0
Requires-Dist: duckdb<2.0,>=0.10
Requires-Dist: pyshacl<0.30,>=0.25
Provides-Extra: test
Requires-Dist: pytest<9.0,>=8.0; extra == "test"

# CogniFlow Ontology

`cf_ontology` is the semantics core for CogniFlow. It ingests RDF documents into DuckDB-backed RDF quads, validates ontology/pipeline documents, and exposes a single Python API (`OntologyManager`) consumed by other packages (for example `cf_web` and pipeline tooling).

Semantics are authored as RDF documents:
- ontology fragments
- step packages
- pipelines

`OntologyManager` loads packaged ontology resources, can ingest installed step packages, validates RDF documents with SHACL, and exposes normalized DTOs plus document exports.

## Interfaces

There are three ways to interact with `cf_ontology`.

### 1) Python functions/classes (direct API)

Use this when building apps/services:
- `OntologyManager`
- `ingest_rdf_documents`
- `rebuild_semantics_db`

This is the primary programmatic interface and the one used by `cf_web`.

### 2) Package CLI (`cf-ontology` or `python -m cf_ontology`)

Use this for semantics operations:
- init/rebuild
- ingest
- export
- pipeline revision/state operations
- signature generation

This is implemented in `src/cf_ontology/cli.py`.

### 3) Unified CLI (`cf ontology ...`)

Use this for ontology inspection in the unified Cogniflow CLI:
- list classes
- inspect class
- list steps
- inspect step

This is implemented in `src/cf_ontology/cf_cli.py` and registered into `cf_cli`.

### Interface differences

- Python API: best for integration in code; returns Python objects/DTOs.
- Package CLI (`cf-ontology`): operational/automation tasks over semantics DB files.
- Unified CLI (`cf ontology`): user-facing inspection commands across packages.

## Storage (Semantics DB)

The default storage is an RDF **Quad Store** backed by DuckDB files under `workspace/semantics` (repo-relative). The location is configurable via environment variables.

Default DB files:

- `cf-ontology.duckdb` (static ontology + shapes)
- `cf-steps.duckdb` (step packages)
- `cf-pipelines.duckdb` (pipeline revisions + audit)
- `cf-pipeline-states.duckdb` (pipeline runtime state/events; not part of default exports yet)

### Backend

`cf_ontology` is quads-only. RDF documents are ingested into RDF quads stored in DuckDB (`rdf_quads`) with a graph catalog (`graphs`).

### Paths

- `CF_SEMANTICS_DIR=/path/to/semantics` (absolute or repo-relative; default: `workspace/semantics`)
- `CF_WORKSPACE_DIR=/path/to/workspace` (used only when `CF_SEMANTICS_DIR` is unset)

If neither is set, `workspace/semantics` is used when a repo root can be detected; otherwise the fallback is `~/.cogniflow/workspace/semantics`.

### Step packages

- `CF_ENABLE_STEP_PACKAGES=0|1` (default: `0`) to discover installed step packages via the `cogniflow.steps` entry point group.

## Graph IDs (binding conventions)

### Step packages

- Split steps (`split_steps=True`): `g = "pkg:{package_name}@{package_version}#{step_id}"`
- Unsplit: `g = "pkg:{package_name}@{package_version}"`

`package_name` is the StepPackage id (from `cf:packageId` in `steps.nq`).

Package upgrades generate new graph ids (the version is part of `g`). Existing graphs are not deleted; missing/uninstalled packages are marked inactive via `graphs.is_active=false`.

### Pipelines (versioned)

- Stable pipeline identifier: `pipeline_id` (UUID or user-provided stable id)
- Revision graph: `g = "pipe:{pipeline_id}@{rev}"` with monotonically increasing `rev` starting at `1`
- "Current" is a pointer in the `pipelines` table (no separate quad graph)

## What is included

- Packaged N-Quads fragments: `ontology/core/*.nq`, `ontology/vocab/*.nq`, and `ontology/shapes/*.nq`.
- `OntologyManager` for loading/merging ontologies and querying step/pipeline metadata.
- Dynamic step-package discovery via the `cogniflow.steps` entry point group.

## Installation

From the `sandcastle/cf_ontology` directory:

```bash
pip install .
```

Published distribution name:

```bash
pip install cf-ontology
```

Optional (for unified `cf` CLI integration):

```bash
pip install -e ../cf_cli
```

## Usage

**Policy:** all reads/writes to the semantics DuckDBs must go through the `cf_ontology` package
(Python API or CLI). Do not access `cf-*.duckdb` directly from other packages.

### Python API

Basic loading and inspection:

```python
from cf_ontology import OntologyManager

ontology = OntologyManager()
print(ontology.get_processing_steps())
print(ontology.get_graph_document())
```

Load rich step DTOs (quads-backed bulk path):

```python
from cf_ontology import OntologyManager

om = OntologyManager(load_resources=False)
steps = om.get_processing_steps_info_quads_bulk()
print(len(steps))
print(steps[0]["@id"])
```

### Importing RDF documents into the Quad Store

Python utility:

```python
from pathlib import Path
from cf_ontology import ingest_rdf_documents

ingest_rdf_documents(
  [Path("sandcastle/cf_pipeline/cf_pipeline_engine/examples/opcua_fifo_avg_to_duckdb_parquet_triggered.nq")],
    graph_type="pipeline",
    package="examples",
)
```

Package CLI:

```bash
python -m cf_ontology ingest --paths sandcastle/cf_pipeline/cf_pipeline_engine/examples/opcua_fifo_avg_to_duckdb_parquet_triggered.nq --graph-type pipeline --package examples
```

Or via console script:

```bash
cf-ontology ingest --paths sandcastle/cf_pipeline/cf_pipeline_engine/examples/opcua_fifo_avg_to_duckdb_parquet_triggered.nq --graph-type pipeline --package examples
```

Unified CLI (inspection-focused):

```bash
cf ontology classes
cf ontology class cf:ProcessingStep --instances
cf ontology steps
cf ontology step cfbs:AverageStep
```

### Ingesting installed step packages

This discovers `cogniflow.steps` entry points and ingests their RDF documents with the graph-id conventions above:

```bash
python -m cf_ontology ingest --installed-steps
```

Re-ingest even if the stored content hash matches:

```bash
python -m cf_ontology ingest --installed-steps --force
```

### Initialize semantics DB (CLI)

```bash
python -m cf_ontology init
```

This initializes all split DB files under the semantics directory.

To force a rebuild (destructive):

```bash
python -m cf_ontology init --rebuild
```

### Fresh install smoke test

Repository bootstrap now uses `scripts/fresh_install_v2.ps1 -Clean`.

For the dedicated Semantics QuadStore smoke test, run the helper script from an activated environment:

```powershell
python sandcastle/cf_ontology/scripts/semantics_smoketest.py `
  --pipeline-document sandcastle/cf_pipeline/cf_pipeline_engine/examples/opcua_fifo_avg_to_duckdb_parquet_triggered.nq `
  --semantics-dir workspace/semantics
```

It prints CI-friendly lines like:

```
[SEMANTICS] backend=quads
[SEMANTICS] db_path=...
[SEMANTICS] tables=graphs,rdf_quads,pipelines,pipeline_revisions OK
[SEMANTICS] committed pipeline_id=... revs=...,... graph_id=...
[SEMANTICS] export_flatten=current_only OK
[SEMANTICS] export_dataset=named_graphs OK
```

`db_path` above points to `cf-pipelines.duckdb`.

### Pipeline versioning (append-only revisions)

Commit pipeline revisions (audit metadata is stored in `pipeline_revisions`):

```python
from cf_ontology import OntologyManager

om = OntologyManager()
rev = om.commit_pipeline(
    "my-pipeline-id",
    {"@context": {"cf": "https://cogniflow.odea-project.org/cf#"}, "@graph": []},
    user="alice",
    message="update",
)
print(rev)
print(om.list_pipeline_revisions("my-pipeline-id"))
print(om.get_pipeline_revision("my-pipeline-id"))  # current
print(om.get_pipeline_revision("my-pipeline-id", rev=1))
```

Commit a pipeline revision from the CLI with an RDF document input:

```bash
python -m cf_ontology pipeline commit --pipeline-id my-pipeline-id --document path/to/pipeline.nq
```

Derive a pipeline id from an RDF document (uses the ProcessingPipeline `@id` local name):

```bash
python -m cf_ontology pipeline id --document path/to/pipeline.nq
```

`OntologyManager.get_graph_document()` includes only current pipeline revisions, not all historical revisions.

### Dataset export (named graphs preserved)

```python
from cf_ontology import OntologyManager

om = OntologyManager()
dataset_document = om.get_graph_dataset_document()
```

### Rebuilding the semantics DB

```python
from cf_ontology import rebuild_semantics_db

rebuild_semantics_db()
```

### Package CLI command groups

`python -m cf_ontology -h` (or `cf-ontology -h`) provides:
- `init`
- `ingest`
- `export`
- `pipeline` (commit/get/id)
- `state` (activate/set/run-loop)
- `siggen`

### Unified CLI command groups

`cf -h` shows groups contributed by installed packages.  
`cf_ontology` contributes:
- `ontology classes`
- `ontology class`
- `ontology steps`
- `ontology step`

## Step package entry point convention

Entry point group: `cogniflow.steps`

Supported entry point values:

- Resource path: `some_pkg_module:steps.nq` (path relative to that package/module)
- Loader function: `some_pkg_module:load_steps` returning an RDF document `dict` or JSON string

Package metadata is read from the `cf:StepPackage` node in `steps.nq` (`cf:packageId` / `cf:packageVersion`), with distribution metadata as a fallback.

## Quad Store schema (DuckDB)

The quad store uses these tables (per DB file):

- `graphs` (catalog)
  - `graph_id TEXT PRIMARY KEY`
  - `graph_type TEXT`
  - `package TEXT`
  - `path TEXT`
  - `context JSON`
  - `graph_kind TEXT` (`package_step|package|pipeline_rev|system|custom`)
  - `is_active BOOLEAN`
  - `content_hash TEXT`
  - `package_name TEXT NULL`, `package_version TEXT NULL`
  - `created_at TIMESTAMP`
  - `updated_at TIMESTAMP`
- `rdf_quads` (RDF terms, normalized)
  - `g TEXT` (named graph id)
  - `s TEXT`, `s_kind TEXT` (`iri|bnode`)
  - `p TEXT`, `p_kind TEXT` (`iri`)
  - `o TEXT`, `o_kind TEXT` (`iri|bnode|literal`)
  - `o_datatype TEXT NULL`, `o_lang TEXT NULL`
  - `graph_type TEXT`, `package TEXT`, `path TEXT`
  - `updated_at TIMESTAMP`
- `pipelines` (current pointer)
  - `pipeline_id TEXT PRIMARY KEY`
  - `current_rev INTEGER`
  - `created_at TIMESTAMP`, `updated_at TIMESTAMP`
  - `created_by TEXT NULL`, `updated_by TEXT NULL`
- `pipeline_revisions` (append-only audit log)
  - `PRIMARY KEY (pipeline_id, rev)`
  - `graph_id TEXT` (equals `pipe:{pipeline_id}@{rev}`)
  - `created_at TIMESTAMP`, `created_by TEXT NULL`
  - `message TEXT NULL`
  - `content_hash TEXT NULL`
  - `parent_rev INTEGER NULL`

In the split layout, `pipelines` and `pipeline_revisions` live only in `cf-pipelines.duckdb`.

## Pipeline states schema (DuckDB)

`cf-pipeline-states.duckdb` stores runtime state as relational tables. Each column has a
corresponding property or class definition in the ontology RDF documents (core/vocab + core/properties).

Tables:

- `pipeline_state` (snapshot/control)
- `pipeline_runs` (run history)
- `run_events` (append-only event log)
- `run_queue` (execution queue + leases)

### Runtime CLI

Canonical pipeline start flow (the only supported run-start contract):

1. Activate pipeline state (creates snapshot row):

```bash
python -m cf_ontology state activate --pipeline-id opcua_fifo_avg --desired-state enabled
```

2. Persist an inbound trigger event (idempotent by `dedupe_key`):

```bash
python -m cf_ontology state emit-event \
  --pipeline-id opcua_fifo_avg \
  --event-type opcua_signal \
  --source manual \
  --dedupe-key demo-opcua-001
```

3. Run worker loop to consume pending events and execute the run:

```bash
python -m cf_ontology state run-loop --pipeline-id opcua_fifo_avg --poll-interval 1.0 --once --invocation-document path/to/invocation.nq
```

Control desired state (pause/sleep/disable):

```bash
python -m cf_ontology state set --pipeline-id opcua_fifo_avg --desired-state sleep
```

`state run-loop` consumes pending `run_events` only and never auto-generates runs from
idle state ticks. If no pending events are available, the worker waits (or exits with `--once`).
Each consumed event creates linked runtime records in `cf-pipeline-states.duckdb`
(`run_events`, `pipeline_runs`, `run_queue`) before engine execution starts.

## Notes

- **DuckDB as default storage:** split DB files keep the high-churn data (pipelines, states) isolated from the mostly static ontology and steps. Parquet export/materialization can be added later for bulk scans and interchange.
- `sandcastle/src` is legacy and not part of ongoing development; new work happens under `sandcastle/*` packages.

## Pipeline steps header

Processing pipelines must declare the step catalogs they rely on via a steps header:

```json
{
  "@id": "ex:MyPipeline",
  "@type": "cf:ProcessingPipeline",
  "cf:hasStepsHeader": { "@id": "ex:stepsHeader" }
},
{
  "@id": "ex:stepsHeader",
  "@type": "cf:StepsHeader",
  "cf:stepsPath": [
    "path/to/steps.nq",
    "path/to/other_steps.nq"
  ]
}
```

The runner uses these paths when `--steps` is not provided on the CLI.

## Pipeline plugins header

Pipelines should also declare plugin directories so the runner can load implementations without `--plugins`:

```json
{
  "@id": "ex:MyPipeline",
  "@type": "cf:ProcessingPipeline",
  "cf:hasPluginsHeader": { "@id": "ex:pluginsHeader" }
},
{
  "@id": "ex:pluginsHeader",
  "@type": "cf:PluginsHeader",
  "cf:pluginPath": [
    "path/to/plugin/bin",
    "path/to/other/plugin/bin"
  ]
}
```

## Publishing

`cf_ontology` is published with the dedicated Windows workflow:

- Workflow: `.github/workflows/cf_ontology_windows_publish.yml`
- Package directory: `sandcastle/cf_ontology`
- PyPI tag: `cf-ontology-v<version>`
- TestPyPI tag: `cf-ontology-v<version>-test`

Local preflight:

```powershell
powershell -ExecutionPolicy Bypass -File scripts/mimic_windows_python_publish_workflow.ps1 `
  -WorkflowFile .github/workflows/cf_ontology_windows_publish.yml `
  -PackageDir sandcastle/cf_ontology `
  -PythonExe py `
  -PythonVersion 3.14
```

Queue a dry-run dispatch:

```powershell
powershell -ExecutionPolicy Bypass -File scripts/queue_windows_python_publish_workflow.ps1 `
  -WorkflowFile .github/workflows/cf_ontology_windows_publish.yml `
  -PackageDir sandcastle/cf_ontology `
  -PublishTarget testpypi `
  -Ref main `
  -RequireLocalPass `
  -DryRun
```

