Metadata-Version: 2.4
Name: wfmeta_dask
Version: 1.0.0
Summary: wfmeta Tool to parse and restructure the scheduler and worker event logs captured by the dask-mofka plugin.
Author-email: Polina Shpilker <infinite.loopholes@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/RECUP-DOE/wfmeta-dask
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>1.24.4
Requires-Dist: pandas>=2.0.3
Dynamic: license-file

# DASK Metadata Capture

![tests](https://github.com/RECUP-DOE/dask_capture/actions/workflows/run_tests.yaml/badge.svg)
![tests](https://github.com/RECUP-DOE/dask_capture/actions/workflows/lint.yaml/badge.svg)

A tool that takes the output of Amal Gueroudji's [Mofka-Dask coupler](https://github.com/GueroudjiAmal/MofkaDask/) to consolidate the generated `.csv` files into a singular, object-focused output.

Developed and tested by Polina Shpilker for the RECUP project.

Expected inputs:
- `-f` `--fileformat` : default `df_csv` \
The format of the output. Options are `txt` (plaintext prettyprint of objects), \
`pickle` (compressed pickle of objects), and `df_csv` (csv output of dfs generated from objects.)
- `-o` `--output` : Output directory to write output files to.
- `directory` : The input directory to pull `scheduler_transition.csv`, `worker_transfer.csv`, and `worker_transition.csv` from.

Usage:
```bash
wfmeta_dask -f df_csv -o output/ data/
```

## Installation

This tool can be installed from pypi or run from source.

## Background
The Mofka-Dask coupler generates a series of `.csv` files based on the process that generated the event and the type of event (e.g. `scheduler_transition.csv` describes the transition of any `Task` states as witnessed by the scheduler, while `worker_transfer.csv` describes the transfer of files as witnessed by a worker.) 

In order for this data to be most useful, it should be collected in a centralized place.
This way, metadata from other sources (such as Darshan) can be consolidated with all the metadata available from DASK.

This script is being developed using the example outputs generated by Amal Gueroudji's example runs, all available [here](https://github.com/GueroudjiAmal/XPDaMoDa).

## Documentation
Documentation is available on [readthedocs](https://infispiel-dask-capture.readthedocs.io/en/latest/).
