Metadata-Version: 2.4
Name: labcas.workflow
Version: 0.1.7
Summary: Get Planetary Data from the Planetary Data System (PDS)
Home-page: https://github.com/NASA-PDS/peppi
Download-URL: https://github.com/NASA-PDS/peppi/releases/
Author: Labcas
Author-email: labcas@jpl.nasa.gov
License: apache-2.0
Keywords: pds,planetary data,api
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: numpy~=1.24.3
Requires-Dist: scikit-image~=0.24.0
Requires-Dist: pandas~=2.2.3
Requires-Dist: matplotlib~=3.9.4
Requires-Dist: boto3==1.35.16
Requires-Dist: dask~=2024.8.0
Requires-Dist: distributed~=2024.8.0
Provides-Extra: ml-worker
Requires-Dist: tensorflow~=2.12.1; extra == "ml-worker"
Provides-Extra: dev
Requires-Dist: black~=25.1.0; extra == "dev"
Requires-Dist: flake8~=7.2.0; extra == "dev"
Requires-Dist: flake8-bugbear~=24.12.12; extra == "dev"
Requires-Dist: flake8-docstrings~=1.7.0; extra == "dev"
Requires-Dist: pep8-naming~=0.14.1; extra == "dev"
Requires-Dist: mypy~=1.15.0; extra == "dev"
Requires-Dist: pydocstyle~=6.3.0; extra == "dev"
Requires-Dist: coverage~=7.8.0; extra == "dev"
Requires-Dist: pytest~=8.3.5; extra == "dev"
Requires-Dist: pytest-cov~=6.1.1; extra == "dev"
Requires-Dist: pytest-watch~=4.2.0; extra == "dev"
Requires-Dist: pytest-xdist~=3.6.1; extra == "dev"
Requires-Dist: pre-commit~=4.2.0; extra == "dev"
Requires-Dist: sphinx~=8.2.3; extra == "dev"
Requires-Dist: sphinx-rtd-theme~=3.0.2; extra == "dev"
Requires-Dist: tox~=4.25.0; extra == "dev"
Requires-Dist: types-setuptools~=78.1.0.20250329; extra == "dev"
Requires-Dist: Jinja2~=3.1.6; extra == "dev"
Requires-Dist: docutils~=0.21.2; extra == "dev"
Dynamic: download-url

# LabCas Workflow

Run workflows for Labcas

Depending on what you do, there are multiple ways of running a labcase workflow:

- **Developers:** for developers: local run, natively running on your OS
- **Integrators:** for AWS Managed Apache Airflow integrators (mwaa), with a local mwaa
- **System Administrators:** for System administors, deployed/configured on AWS
- **End users:** For end users, using the AWS deployment.


## Developers

The tasks of the workflow run independently from Airflow. TODO: integrate to the airflow python API.

### Install

With python 3.11, preferably use a virtual environment


    pip install -e '.[dev]'

### Set AWS connection

    ./aws-login.darwin.amd64
    export AWS_PROFILE=saml-pub

### Run/Test the client

    python src/labcas/workflow/manager/main.py

### Deploy package on pypi

Upgrade the version in file "src/labcas/workflow/VERSION.txt"

Publish the package on pypi:

    pip install build
    pip install twine
    rm dist/*
    python -m build
    twine upload dist/*
   


## Integrators

### Build the Dask worker image


    docker build -f docker/Dockerfile . -t labcas/workflow

### Create a managed AirFlow docker image to be run locally

Use repository https://github.com/aws/aws-mwaa-local-runner, clone it, then:

    ./mwaa-local-env build-image

Then from your local labcas_workflow repository:

    cd mwaa

As needed, update requirements in `requirements` directory and dags in `dags` directory.

## Update the AWS credentials

    aws-login.darwin.amd64
    cp -r ~/.aws .

## Launch the server
 
    docker compose -f docker-compose-local.yml up

Test the server on http://localhost:8080 , login admin/test

## Stop 

    Ctrl^C

## Stop and re-initialize local volumes

    docker compose  -f ./docker/docker-compose-local.yml down -v

    

See the console on http://localhost:8080, admin/test

## Test the requirement.txt files
 
    ./mwaa-local-env test-requirements

## Debug the workflow import

    docker container ls

Pick the container id of image "amazon/mwaa-local:2_10_3", for example '54706271b7fc':

Then open a bash interpreter in the docker container:

    docker exec -it 54706271b7fc bash

And, in the bash prompt:

    cd dags
    python3 -c "import nebraska"

Start the scheduler:

    docker network create dask
    docker run --network dask -p 8787:8787 -p 8786:8786 labcas/workflow scheduler

Start one worker

    docker run  --network dask -p 8786:8786 labcas/workflow worker 


Start the client, same as in following section


### With dask on ECS

Deploy the image created in the previous section on ECR

Have a s3 bucket `labcas-infra` for the terraform state.

Other pre-requisites are:
 - a VPC
 - subnets
 - a security group allowing incoming request whre the client runs, at JPL, on EC2 or Airflow, to port 8786 and port 8787
 - a task role allowing to write on CloudWatch
 - a task execution role which pull image from ECR and standard ECS task Excecution role policy "AmazonECSTaskExecutionRolePolicy"
 

Deploy the ECS cluster with the following terraform command:

    cd terraform
    terraform init
    terraform apply \
        -var consortium="edrn" \
        -var venue="dev" \
        -var aws_fg_image=<uri of the docker image deployed on ECR>
        -var aws_fg_subnets=<private subnets of the AWS account> \
        -var aws_fg_vpc=<vpc of the AWS account> \
        -var aws_fg_security_groups  <security group> \
        -var ecs_task_role <arn of a task role>
        -var ecs_task_execution_role <arn of task execution role>

## Run

Set you local AWS credentials to access the data


    ./aws-login.darwin.amd64
    export AWS_PROFILE=saml-pub


Start the dask cluster


Run the processing


    python ./src/labcas/workflow/manager/main.py

Publish the package on pypi

    pip install build
    pip install twine
    python -m build
    twine upload dist/*


# Apache Airflow










