Metadata-Version: 2.4
Name: meow-ml
Version: 0.1.0
Summary: MEOW-ML framework for Earth observation machine learning workflows.
License: NASA Open Source Agreement 1.3
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: NOTICE
License-File: NOSA-GSC-19449-1.pdf
Requires-Dist: certifi>=2024.7.4
Requires-Dist: geopandas>=0.11.0
Requires-Dist: joblib>=1.1.0
Requires-Dist: keras>=2.10.0
Requires-Dist: jinja2>=3.1.6
Requires-Dist: matplotlib>=3.5.0
Requires-Dist: mlflow>=2.22.4
Requires-Dist: numpy>=1.21.0
Requires-Dist: omegaconf>=2.2.0
Requires-Dist: pandas>=1.4.0
Requires-Dist: pillow>=10.3.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: rasterio>=1.3.0
Requires-Dist: requests>=2.32.4
Requires-Dist: scikit-learn>=1.5.0
Requires-Dist: scipy>=1.8.0
Requires-Dist: seaborn>=0.11.0
Requires-Dist: tensorflow>=2.11.1
Requires-Dist: tqdm>=4.66.3
Requires-Dist: urllib3>=2.5.0
Requires-Dist: werkzeug>=3.1.6
Provides-Extra: dev
Requires-Dist: bandit>=1.7.0; extra == "dev"
Requires-Dist: build>=1.2.0; extra == "dev"
Requires-Dist: pip-audit>=2.7.0; extra == "dev"
Requires-Dist: pytest>=7.0.0; extra == "dev"
Dynamic: description
Dynamic: description-content-type
Dynamic: license
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# MEOW-ML

> Copyright ©2024 United States Government as represented by the Administrator of the National Aeronautics and Space Administration. No copyright is claimed in the United States under Title 17, U.S. Code. All Other Rights Reserved.

MEOW-ML is an Earth observation machine learning framework for workflows that combine multiple raster inputs, configuration-driven experiments, and reproducible training runs. This branch is the open-source release branch for the MEOW-ML package and ships with a forest canopy height change example that demonstrates the framework on a concrete remote-sensing task.

The project is organized around a simple idea: keep the reusable orchestration generic, and push domain-specific choices into configuration files, data generators, model factories, and evaluators. That makes it possible to adapt the same workflow pattern to different Earth observation problems without rewriting the full training stack each time.

## What Is Supported

The supported public surface in this branch is the `meow_ml` package:

- `meow_ml.core`
  Core training flow, MLflow setup, model factory wiring, and framework utilities.
- `meow_ml.examples.forest_canopy`
  A worked example for canopy height change prediction.

Legacy modules under `meow_ml.src` remain in the repository because the current MEOW-ML runtime still depends on them as implementation details, but they are not the preferred public API.

## Repository Layout

```text
.
├── environment.yml                  # Canonical conda environment definition
├── meow_ml/
│   ├── core/                        # Reusable framework entrypoints and utilities
│   ├── examples/forest_canopy/      # Public example workflow
│   ├── plugins/                     # Reserved extension area
│   ├── src/                         # Internal compatibility/runtime layer
│   └── ...
├── CONTRIBUTING.md
├── SECURITY.md
└── setup.py
```

## Installation

MEOW-ML currently targets Python 3.10+.

Conda setup:

```bash
conda env create -f environment.yml
conda activate meow_ml
```

```bash
git clone https://github.com/<org-or-user>/meow-ml.git
cd meow-ml
python -m pip install --upgrade pip
python -m pip install -e .
```

If you want the local review tooling as well:

```bash
python -m pip install -e .[dev]
```

## Quick Start

The public example config is located at `meow_ml/examples/forest_canopy/configs/chc_config.yaml`. Before training, update the placeholder values:

- `data_generator.data_path`
  Set this to the root directory that contains your site/year tile directories.
- `data_generator.file_paths`
  Replace the sample relative tile directories with the tiles available in your dataset.
- `mlflow.TRACKING_URI`
  Set this to a local directory such as `./mlruns` or to a remote MLflow URI such as `http://localhost:5000`.

Run the forest canopy example:

```bash
python -m meow_ml.examples.forest_canopy.main \
  --config meow_ml/examples/forest_canopy/configs/chc_config.yaml
```

Run the generic MEOW-ML core entrypoint with your own config:

```bash
python -m meow_ml.core.main --config /path/to/config.yaml
```

## Configuration Model

MEOW-ML uses YAML configs with three main sections:

- `data_generator`
  Describes where the raster data lives, how inputs are grouped into branches, how labels are produced, and how data is split.
- `models`
  A list of model runs to execute. Each run supplies the architecture and training parameters needed by the selected runtime.
- `mlflow`
  Controls experiment naming and the tracking backend.

Minimal shape:

```yaml
data_generator:
  data_path: /path/to/earth-observation-data-root/
  batch_size: 32

models:
  - model_name: "baseline"
    model_type: "Sequential"

mlflow:
  EXPERIMENT_NAME: "example-experiment"
  TRACKING_URI: ./mlruns
```

## How The Framework Fits Together

### Data Generators

Data generators load, validate, scale, and batch raster-derived inputs for training, validation, and testing. In this codebase they define the relationship between on-disk raster tiles and model-ready tensors. The generator is where you encode data-specific logic such as file suffix conventions, truth extraction, label scaling, and train/validation/test splits.

### Model Factories

Model factories create and compile models from configuration. This keeps architecture definition separate from training orchestration and allows multiple experiments to reuse the same training flow while swapping the model implementation.

### Training Flow

The training entrypoint wires together:

1. config loading
2. MLflow setup
3. model factory creation
4. data generator construction
5. training
6. evaluation, plots, and artifact logging

That split is intentional. It makes the framework reusable across Earth observation tasks while still allowing example-specific classes to own their domain logic.

## Forest Canopy Example

The forest canopy example shows how to apply the framework to canopy height change prediction using multi-branch raster inputs. It is included for two reasons:

- to provide a realistic end-to-end example of the framework pattern
- to preserve continuity with the project’s existing research workflow

The example currently demonstrates:

- multi-branch input handling for hyperspectral and lidar-derived features
- config-driven model training
- MLflow experiment logging
- example-specific extension points and diagnostics for data loading, evaluation, and model construction

Optional diagnostics for the example live under `meow_ml.examples.forest_canopy.tools`:

- `tile_audit`
  Checks that the tiles referenced by the canopy example config exist, contain the expected files, and agree on raster shape.
- `compare_rasters`
  Compares two rasters for shape, nodata counts, and value agreement.

Example usage:

```bash
python -m meow_ml.examples.forest_canopy.tools.tile_audit \
  --config meow_ml/examples/forest_canopy/configs/chc_config.yaml

python -m meow_ml.examples.forest_canopy.tools.compare_rasters \
  --raster-a /path/to/raster_a.tif \
  --raster-b /path/to/raster_b.tif \
  --nodata-value -9999
```

## Extending MEOW-ML

To adapt the framework to a new Earth observation workflow:

1. create a task-specific data generator
2. create or reuse a model factory
3. create an evaluator if the default plots/metrics are insufficient
4. add a YAML config for the workflow
5. run through the generic MEOW-ML entrypoint

The cleanest extension path is to follow the same pattern used by the forest canopy example under `meow_ml/examples`.

## Optional Class Hooks

The generic runtime still supports an optional `class` field under `data_generator` and each model entry. That hook was added during the MEOW-ML refactor so advanced users can dynamically point a config at a custom data generator or model factory without editing the trainer.

The shipped public example does not require `class`, and the release branch intentionally keeps the example YAML free of internal import strings so the config stays readable.

## Security And Privacy Notes

- This branch has been sanitized to remove hard-coded local machine paths and user-specific environment details from tracked files.
- Example configs use placeholders for dataset roots and model artifact paths.
- Do not commit credentials, tokens, or site-specific secrets into configs or notebooks.
- Use `SECURITY.md` for vulnerability reporting guidance.

## Open-Source Review Status

A release-readiness review summary is tracked in `OPEN_SOURCE_REVIEW.md`. It records what was cleaned up for open-source review, what remains intentionally reference-only, and any residual items that need human/legal confirmation before publication.

## Contributing

Contribution guidance lives in `CONTRIBUTING.md`.

## License

This repository ships under the NASA Open Source Agreement (NOSA). See `LICENSE` for the text version and `NOSA-GSC-19449-1.pdf` for the approved NASA-provided release copy.
