Metadata-Version: 2.1
Name: perun
Version: 0.1.0b9
Summary: 
Home-page: https://github.com/Helmholtz-AI-Energy/perun
License: BSD-3-Clause
Author: Gutiérrez Hermosillo Muriedas, Juan Pedro
Author-email: juanpedroghm@gmail.com
Requires-Python: >=3.8,<4.0
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Dist: PyYAML (>=6.0,<7.0)
Requires-Dist: click (>=8.1.3,<9.0.0)
Requires-Dist: h5py (>=3.7.0,<4.0.0)
Requires-Dist: influxdb-client (>=1.31.0,<2.0.0)
Requires-Dist: mpi4py (>=3.1.3,<4.0.0)
Requires-Dist: numpy (>=1.23.2,<2.0.0)
Requires-Dist: pandas (>=1.4.3,<2.0.0)
Requires-Dist: py-cpuinfo (>=8.0.0,<9.0.0)
Requires-Dist: pynvml (>=11.4.1,<12.0.0)
Requires-Dist: python-dotenv (>=0.20.0,<0.21.0)
Description-Content-Type: text/markdown

<div align="center">
  <img src="https://raw.githubusercontent.com/Helmholtz-AI-Energy/perun/main/doc/images/perun.svg">
</div>

Have you ever wondered how much energy is used when training your neural network on the MNIST dataset? Want to get scared because of impact you are having on the evironment while doing "valuable" research? Are you interested in knowing how much carbon you are burning playing with DALL-E just to get attention on twitter? If the thing that was missing from your machine learning workflow was existential dread, this is the correct package for you!

## Installation

From PyPI:

```$ pip install perun```

From Github:

```$ pip install git+https://github.com/Helmholtz-AI-Energy/perun```

### Parallel h5py

To build h5py with mpi support:

```CC=mpicc HDF5_MPI="ON" pip install --no-binary h5py h5py```

## Usage

### Command line

To get a quick report of the power usage of a python script simply run

```$ perun monitor --format yaml path/to/your/script.py [args]```

Or

```$ python -m perun monitor --format json -o results/ path/to/your/script.py [args]```


#### Subcommands

**monitor**

Monitor energy usage of a python script.

```
Usage: perun monitor [OPTIONS] SCRIPT [SCRIPT_ARGS]...

  Gather power consumption from hardware devices while SCRIPT [SCRIPT_ARGS] is
  running.

  SCRIPT is a path to the python script to monitor, run with arguments
  SCRIPT_ARGS.

Options:
  -f, --frequency FLOAT         sampling frequency (in Hz)
  --format [txt|yaml|yml|json]  report print format
  -o, --outdir DIRECTORY        experiment data output directory
  --help                        Show this message and exit.
```

**report**

Print a report from previous monitoring results.

```
Usage: perun report [OPTIONS] EXP_HDF5

  Print consumption report from EXP_HDF5 on the command line on the desired
  format.

  EXP_HDF5 is an hdf5 file generated by perun after monitoring a script,
  containing data gathered from hardware devices.

Options:
  -f, --format [txt|yaml|yml|json]
                                  report print format
  --help                          Show this message and exit.
```

**postprocess**

Apply postprocessing to existing perun experiment data.

```
Usage: perun postprocess [OPTIONS] EXP_HDF5

  Apply post-processing to EXP_HDF5 experiment file.

  EXP_HDF5 is an hdf5 file generated by perun after monitoring a script,
  containing data gathered from hardware devices.

Options:
  --help  Show this message and exit.
```

### Decorator

Or decorate the function that you want analysed

```python
import perun

@perun.monitor(outDir="results/", format="txt")
def training_loop(args, model, device, train_loader, test_loader, optimizer, scheduler):
    for epoch in range(1, args.epochs + 1):
        train(args, model, device, train_loader, optimizer, epoch)
        test(model, device, test_loader)
        scheduler.step()
```


Optional Arguments:

|   |   |
|---|---|
|frequency: FLOAT             |sampling frequency (in Hz) |
|format: [txt|yaml|yml|json]  |report print format |
|outdir DIRECTORY:            |experiment data output directory |

## Experiment data

Raw data is saved in a hdf5 file, where results over multiple runs are accumulated.

At the top level, the root group containts groups for all individual runs, as well as information about creation date and total energy consumption over multiple runs.

Experiments contain the information about a single run of the python script, and has information about nodes, devices per node, total runtime and avgerage power draw.

The same applies at the node level.

At the lowest group level, datasets with sample data from individual devices is collected, with the dataset attributes providing the device metadata, like id, measurement unit and magnitude, and possible value range.

- exp_name (Root group)
  - exp_0 (Group)
    - node_name_0 (Group)
      - device_0 (Dataset)
        - units (Attribute)
        - mag (Attribute)
        - long_name (Attribute)
        - ...
      - device_1 (Dataset)
      - ...
    - node_name_1 (Group)
    - ...
  - exp_1 (Group)
  - ...

