Metadata-Version: 2.4
Name: openeo-core
Version: 0.1.2
Summary: Standalone Python core for defining and executing openEO processes locally with xarray, dask, and ML backends.
Project-URL: Homepage, https://github.com/PondiB/openeo-core
Project-URL: Repository, https://github.com/PondiB/openeo-core
Project-URL: Documentation, https://github.com/PondiB/openeo-core/tree/main/docs
Project-URL: Issues, https://github.com/PondiB/openeo-core/issues
Project-URL: Changelog, https://github.com/PondiB/openeo-core/releases
Author: Brian Pondi
License: Apache-2.0
License-File: LICENSE
Keywords: dask,earth-observation,machine-learning,openeo,remote-sensing,stac,xarray
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: GIS
Requires-Python: >=3.10
Requires-Dist: dask-geopandas>=0.5.0
Requires-Dist: dask[array,dataframe,distributed]>=2026.1.2
Requires-Dist: geopandas>=1.1.2
Requires-Dist: joblib>=1.5.3
Requires-Dist: numpy>=2.2.6
Requires-Dist: pandas>=2.3.3
Requires-Dist: planetary-computer>=1.0.0
Requires-Dist: pyproj>=3.6.0
Requires-Dist: pystac-client>=0.9.0
Requires-Dist: pystac>=1.14.3
Requires-Dist: rioxarray>=0.19.0
Requires-Dist: scipy>=1.15.3
Requires-Dist: shapely>=2.1.2
Requires-Dist: stackstac>=0.5.1
Requires-Dist: xarray>=2025.6.1
Requires-Dist: xvec>=0.5.2
Provides-Extra: all
Requires-Dist: s2cloudless>=1.7.0; extra == 'all'
Requires-Dist: scikit-learn>=1.7.2; extra == 'all'
Requires-Dist: torch-optimizer>=0.3.0; extra == 'all'
Requires-Dist: torch>=2.2.0; extra == 'all'
Requires-Dist: xgboost>=3.2.0; extra == 'all'
Provides-Extra: dev
Requires-Dist: pytest-cov>=7.0.0; extra == 'dev'
Requires-Dist: pytest>=9.0.2; extra == 'dev'
Requires-Dist: s2cloudless>=1.7.0; extra == 'dev'
Requires-Dist: scikit-learn>=1.7.2; extra == 'dev'
Requires-Dist: torch-optimizer>=0.3.0; extra == 'dev'
Requires-Dist: torch>=2.2.0; extra == 'dev'
Requires-Dist: xgboost>=3.2.0; extra == 'dev'
Provides-Extra: ml-sklearn
Requires-Dist: joblib>=1.5.3; extra == 'ml-sklearn'
Requires-Dist: scikit-learn>=1.7.2; extra == 'ml-sklearn'
Provides-Extra: ml-torch
Requires-Dist: torch-optimizer>=0.3.0; extra == 'ml-torch'
Requires-Dist: torch>=2.2.0; extra == 'ml-torch'
Provides-Extra: ml-xgboost
Requires-Dist: xgboost>=3.2.0; extra == 'ml-xgboost'
Provides-Extra: optical
Requires-Dist: s2cloudless>=1.7.0; extra == 'optical'
Description-Content-Type: text/markdown

# openeo-core

[![PyPI version](https://img.shields.io/pypi/v/openeo-core.svg)](https://pypi.org/project/openeo-core/)
[![Python](https://img.shields.io/pypi/pyversions/openeo-core.svg)](https://pypi.org/project/openeo-core/)
[![License](https://img.shields.io/github/license/PondiB/openeo-core.svg)](https://github.com/PondiB/openeo-core/blob/main/LICENSE)
[![Tests](https://img.shields.io/github/actions/workflow/status/PondiB/openeo-core/tests.yml?branch=dev&label=tests)](https://github.com/PondiB/openeo-core/actions)
[![STAC MLM](https://img.shields.io/badge/STAC-MLM%20v1.5.1-blue)](https://stac-extensions.github.io/mlm/)
[![openEO](https://img.shields.io/badge/openEO-process--aligned-green)](https://openeo.org/)

A standalone Python library providing a fluent, Pythonic API for working with **raster data cubes** and **vector cubes**, implementing selected **openEO processes** locally using **xarray** and **dask**, with **STAC MLM-compatible** ML model objects.

## Features

- **Fluent DataCube API** — chain raster and vector operations in a readable pipeline
- **openEO process-aligned** — function signatures match the openEO process specs
- **STAC MLM-compatible models** — every model carries full STAC Machine Learning Model metadata
- **Multiple ML backends** — scikit-learn, XGBoost, and PyTorch (TempCNN, LightTAE)
- **Flexible feature dimensions** — control which cube dimensions become model features via `dimension`
- **Spatial indexing** — accelerated vector operations with R-tree spatial index
- **Process Registry** — discover and search bundled openEO process specifications

## Installation

### Install from GitHub

```bash
# With uv
uv pip install git+https://github.com/PondiB/openeo-core.git

# With pip
pip install git+https://github.com/PondiB/openeo-core.git
```

Optional extras (ML backends, dev):

```bash
# ML backends
uv pip install "openeo-core[ml-sklearn] @ git+https://github.com/PondiB/openeo-core.git"
uv pip install "openeo-core[ml-xgboost] @ git+https://github.com/PondiB/openeo-core.git"
uv pip install "openeo-core[ml-torch] @ git+https://github.com/PondiB/openeo-core.git"

# Everything
uv pip install "openeo-core[all] @ git+https://github.com/PondiB/openeo-core.git"

# Dev tools
pip install "openeo-core[dev] @ git+https://github.com/PondiB/openeo-core.git"
```

### Install from source (development)

Clone the repository and sync dependencies:

```bash
git clone https://github.com/PondiB/openeo-core.git
cd openeo-core

# Core install (xarray, dask, geopandas, pystac-client, stackstac)
uv sync

# With ML backends
uv sync --extra ml-sklearn
uv sync --extra ml-xgboost
uv sync --extra ml-torch

# Everything including dev tools
uv sync --extra dev
```

## Quick Start

### Fluent DataCube API

```python
from openeo_core import DataCube

# Load from AWS Earth Search (Sentinel-2)
cube = DataCube.load_collection(
    "sentinel-2-l2a",
    spatial_extent={"west": 10.0, "south": 50.0, "east": 11.0, "north": 51.0},
    temporal_extent=("2023-06-01", "2023-06-30"),
    bands=["red", "nir"],
)

# Fluent chaining
result = (
    cube
    .filter_bbox(west=10.2, south=50.2, east=10.8, north=50.8)
    .filter_temporal(extent=("2023-06-10", "2023-06-20"))
    .ndvi(nir="nir", red="red")
    .compute()
)
```

### ML Models (openEO process-aligned, STAC MLM-compatible)

Model objects are **STAC MLM-compatible** and the API follows the openEO process specs exactly:

```python
from openeo_core.model import (
    mlm_class_random_forest,
    mlm_regr_random_forest,
    mlm_class_xgboost,
    mlm_class_tempcnn,
    mlm_class_lighttae,
    ml_fit,
    ml_predict,
    save_ml_model,
    load_stac_ml,
)

# 1. Initialize (openEO: mlm_class_random_forest)
model = mlm_class_random_forest(
    max_variables="sqrt",
    num_trees=200,
    seed=42,
)

# 2. Train (openEO: ml_fit)
trained = ml_fit(model, training_gdf, target="label")

# 3. Predict (openEO: ml_predict)
predictions = ml_predict(raster_cube, trained)

# 4. Save with STAC Item (openEO: save_ml_model)
save_ml_model(trained, name="my_rf_model")

# 5. Load from STAC Item (openEO: load_stac_ml)
restored = load_stac_ml("my_rf_model/my_rf_model.stac.json")
predictions = ml_predict(new_raster, restored)
```

#### Feature dimensions

The `dimension` parameter controls which data cube dimensions are flattened
into the feature vector for model training and prediction. It is set once at
model initialisation and used automatically by `ml_predict`:

```python
# Default: only the "bands" dimension becomes features
model = mlm_class_random_forest(dimension=["bands"])

# Use both spectral and temporal dimensions as features
model = mlm_class_random_forest(
    max_variables="sqrt",
    num_trees=200,
    dimension=["bands", "t"],
)
trained = ml_fit(model, training_gdf, target="label")
predictions = ml_predict(raster_cube, trained)  # dimension handled automatically
```

Default values per model type:
| Model | Default `dimension` |
| ----- | ------------------- |
| Random Forest | `["bands"]` |
| XGBoost | `["bands"]` |
| TempCNN | `["bands", "t"]` |
| LightTAE | `["bands", "t"]` |

#### XGBoost classification

```python
model = mlm_class_xgboost(
    learning_rate=0.15,
    max_depth=5,
    min_child_weight=1,
    subsample=0.8,
    min_split_loss=1,
    seed=42,
)
trained = ml_fit(model, training_gdf, target="label")
```

#### TempCNN classification (PyTorch)

```python
model = mlm_class_tempcnn(
    epochs=100,
    batch_size=64,
    learning_rate=0.001,
    seed=42,
)
trained = ml_fit(model, training_gdf, target="label")
predictions = ml_predict(raster_cube, trained)
```

#### LightTAE classification (PyTorch)

```python
model = mlm_class_lighttae(
    epochs=150,
    batch_size=128,
    learning_rate=0.0005,
    seed=42,
)
trained = ml_fit(model, training_gdf, target="label")
predictions = ml_predict(raster_cube, trained)
```

#### STAC MLM metadata on model objects

Every model carries full STAC MLM metadata:

```python
model = mlm_class_random_forest(max_variables="sqrt", num_trees=100)
props = model.to_stac_properties()
# {
#   "mlm:name": "Random Forest Classifier",
#   "mlm:architecture": "Random Forest",
#   "mlm:tasks": ["classification"],
#   "mlm:framework": "scikit-learn",
#   "mlm:hyperparameters": {"max_variables": "sqrt", "num_trees": 100, "seed": null},
#   "mlm:input": [...],
#   "mlm:output": [...],
#   ...
# }

stac_item = model.to_stac_item()
# Full STAC Feature with MLM extension
```

#### Convenience factory (backward-compatible)

```python
from openeo_core.model import Model, ml_fit, ml_predict

model = Model.random_forest(task="classification", max_variables="sqrt", num_trees=200)
trained = ml_fit(model, gdf, target="label")
preds = ml_predict(raster, trained)

# PyTorch models
model = Model.tempcnn(epochs=50, batch_size=32)
model = Model.lighttae(epochs=100, learning_rate=0.001)
```

### Process Registry

```python
from openeo_core.processes import ProcessRegistry

registry = ProcessRegistry()
print(registry.list_processes())
ndvi_spec = registry.get_process("ndvi")
results = registry.search("vegetation")
```

### Load from STAC / GeoJSON

```python
cube = DataCube.load_stac(
    "https://earth-search.aws.element84.com/v1/collections/sentinel-2-l2a",
    assets=["red", "nir"],
)

vector = DataCube.load_geojson({"type": "FeatureCollection", "features": [...]})
```

### Vector cubes (GeoDataFrame and xvec)

Vector cubes can be GeoDataFrames or xarray DataArrays/Datasets with xvec geometry coordinates:

```bash
uv pip install "openeo-core[geo]"
```

```python
import xarray as xr
from shapely.geometry import Point

# Create xvec-backed vector cube
da = xr.DataArray(
    [1.0, 2.0, 3.0],
    dims=["geom"],
    coords={"geom": [Point(10, 50), Point(10.5, 50.5), Point(11, 51)]},
).xvec.set_geom_indexes("geom", crs=4326)

cube = DataCube(da)
result = cube.filter_bbox(west=9, south=49, east=11, north=51)
```

## Documentation

- **[docs/index.md](docs/index.md)** — Documentation index
- **[docs/architecture.md](docs/architecture.md)** — Software structure, design, and component overview

## Architecture

```
openeo_core/
  __init__.py          # DataCube, type aliases
  datacube.py          # Fluent wrapper + dispatch
  types.py             # RasterCube/VectorCube/Cube aliases
  ops/
    raster.py          # xarray/dask raster operations
    vector.py          # geopandas, dask-geopandas, xvec vector operations
  io/
    collection.py      # load_collection (pystac-client + stackstac)
    stac.py            # load_stac (pystac + stackstac)
    geojson.py         # load_geojson (geopandas)
  model/
    __init__.py        # Public API exports
    mlm.py             # MLModel (STAC MLM-compatible object)
    base.py            # openEO process functions + Model factory
    sklearn.py         # scikit-learn estimator builder (internal)
    xgboost_backend.py # XGBoost estimator builder (internal)
    torch.py           # PyTorch wrapper (TempCNN, LightTAE)
    torch_models/      # PyTorch nn.Module implementations
      tempcnn.py       # TempCNN architecture
      lighttae.py      # LightTAE architecture
  processes/
    registry.py        # JSON spec registry
    resources/         # Packaged process JSON specs
```

### openEO ML Process Mapping

| openEO Process            | Python Function                       | Description                |
| ------------------------- | ------------------------------------- | -------------------------- |
| `mlm_class_random_forest` | `mlm_class_random_forest()`           | Init RF classifier         |
| `mlm_regr_random_forest`  | `mlm_regr_random_forest()`            | Init RF regressor          |
| `mlm_class_xgboost`       | `mlm_class_xgboost()`                 | Init XGBoost classifier    |
| `mlm_class_tempcnn`       | `mlm_class_tempcnn()`                 | Init TempCNN classifier    |
| `mlm_class_lighttae`      | `mlm_class_lighttae()`                | Init LightTAE classifier   |
| `ml_fit`                  | `ml_fit(model, training_set, target)` | Train a model              |
| `ml_predict`              | `ml_predict(data, model)`             | Predict with trained model |
| `save_ml_model`           | `save_ml_model(data, name, options)`  | Save model + STAC Item     |
| `load_stac_ml`            | `load_stac_ml(uri, ...)`              | Load model from STAC Item  |

## Examples

| Notebook | Description |
| -------- | ----------- |
| [01_ndvi.ipynb](examples/01_ndvi.ipynb) | NDVI computation with the DataCube API |
| [02_ml_random_forest.ipynb](examples/02_ml_random_forest.ipynb) | Random Forest classification pipeline |
| [03_process_registry.ipynb](examples/03_process_registry.ipynb) | Exploring the Process Registry |
| [04_ml_tempcnn.ipynb](examples/04_ml_tempcnn.ipynb) | TempCNN temporal classification with PyTorch |

## Running Tests

```bash
uv run pytest tests/ -v
```

## License

Apache-2.0
