Metadata-Version: 2.1
Name: l2metrics
Version: 3.0.0
Summary: Metrics for Lifelong Learning
Home-page: https://github.com/darpa-l2m/l2metrics
Author: Eric Nguyen, Megan Baker
Author-email: Eric.Nguyen@jhuapl.edu, Megan.Baker@jhuapl.edu
License: MIT
Download-URL: https://github.com/darpa-l2m/l2metrics/archive/v3.0.0.tar.gz
Description: # Lifelong Learning Metrics (L2Metrics)
        
        ![logo](https://github.com/darpa-l2m/l2metrics/blob/main/docs/apl_small_vertical_blue.png)
        
        ## Table of Contents
        
        - [Introduction](#introduction)
        - [Metrics](#metrics)
        - [Data Preprocessing](#data-preprocessing)
        - [Requirements](#requirements)
          - [Installation](#installation)
        - [Usage](#usage)
          - [Command-Line Execution](#command-line-execution)
          - [Storing Single-Task Expert Data](#storing-single-task-expert-data)
          - [Clearing Single-Task Expert Data](#clearing-single-task-expert-data)
          - [Generating a Metrics Report](#generating-metrics-report)
          - [Custom Metrics](#custom-metrics)
        - [Changelog](#changelog)
        - [License](#license)
        - [Acknowledgements](#acknowledgements)
        
        ## Introduction
        
        Lifelong Learning Metrics (L2Metrics) is a Python library containing foundational code for the L2M Metrics Framework. This framework includes the following:
        
        - Python libraries for processing performance logs generated by lifelong learning algorithms
        - Support for extending the framework with custom metrics
        
        ## Metrics
        
        The L2Metrics library supports the following lifelong learning metrics as defined in the [Lifelong Learning Metrics for L2M specification](https://arxiv.org/abs/2201.08278):
        
        - Performance Recovery (PR)
        - Performance Maintenance (PM)
        - Forward Transfer (FT)
        - Backward Transfer (BT)
        - Performance Relative to a Single-Task Expert (RP)
        - Sample Efficiency (SE)
        
        ## Data Preprocessing
        
        Refer to the [Data Processing README](https://github.com/darpa-l2m/l2metrics/blob/main/docs/data_preprocessing.md) for details on the data preprocessing methods in this library.
        
        ## Requirements
        
        L2Metrics is written in Python 3 and it is highly recommended to use at least version Python 3.6. The Metrics Framework has been tested on Windows 10 and Ubuntu 18.04/20.04. It should work on other platforms but has not been verified.
        
        ### Installation
        
        #### 1. (Optional) Create a Python virtual environment
        
        ```bash
        python -m venv <path_to_new_venv>
        ```
        
        Activate the virtual environment as follows:
        
        Linux:
        
        ```bash
        source <path_to_new_venv>/bin/activate
        ```
        
        Windows:
        
        ```powershell
        <path_to_new_venv>/Scripts/Activate.ps1
        ```
        
        #### 2. Update pip and wheel in your environment
        
        ```bash
        pip install -U pip wheel
        ```
        
        #### 3. Clone the L2Logger and L2Metrics repositories
        
        ```bash
        git clone https://github.com/darpa-l2m/l2logger.git
        git clone https://github.com/darpa-l2m/l2metrics.git
        ```
        
        #### 4. Install the L2Logger and L2Metrics packages
        
        ```bash
        pip install -e <path_to_l2logger>
        pip install -e <path_to_l2metrics>
        ```
        
        ## Usage
        
        To calculate metrics on the performance of your system, you must first generate log files in accordance with the L2Logger format version 1.1. Please refer to the L2Logger documentation for more details on how to generate compatible logs.
        
        Once these logs are generated, you'll need to store Single-Task Expert (STE) data and pass the log directories as command-line arguments to compute STE-related metrics. Several example files are included to get you started:
        
        - Example STE and LL log directories:
          - `./examples/ste_task1_1_run1/` (STE)
          - `./examples/ste_task2_1_run1/` (STE)
          - `./examples/ste_task3_1_run1/` (STE)
          - `./examples/ste_task3_1_run2/` (STE)
          - `./examples/multi_task/` (LL)
        - Example `settings.json` file for configuring command-line arguments
        - Example `data_range.json` file to show how the user can specify task normalization ranges
        
        ### Command-Line Execution
        
        Refer to the [Command-Line README](https://github.com/darpa-l2m/l2metrics/blob/main/docs/command_line.md) for more information on how to run L2Metrics from the command line.
        
        ### Storing Single-Task Expert Data
        
        The following commands are examples of how to store STE data from the provided logs, run from the root L2Metrics directory:
        
        ```bash
        python -m l2metrics -l examples/ste_task1_1_run1 -s w
        python -m l2metrics -l examples/ste_task2_1_run1 -s w
        python -m l2metrics -l examples/ste_task3_1_run1 -s w
        python -m l2metrics -l examples/ste_task3_1_run2 -s a
        ```
        
        The specified log data will be stored in the `$L2DATA` directory under the `taskinfo` subdirectory, where all single-task expert data is pickled and saved. The STE store mode specified in the first three example commands is `w`, which is "write" or "overwrite." This mode will create a new pickle file for the STE if one does not already exist; if there is already a file for the same task in the `taskinfo` location, it will be overwritten in this mode. The last example command used the append mode, `a`, which allows users to store multiple runs of STE data in the same pickle file. Then, the STE averaging method can be selected in the `l2metrics` module to modify how multiple STE runs are handled. Storing STE data assumes the provided log only contains data for a single task/variant.
        
        Replace the log directory argument with logs for other STE tasks and repeat until all STE data is stored.
        
        ### Clearing Single-Task Expert Data
        
        If the user would like to clear the `taskinfo` subdirectory of all previously-stored STE data,
        run the following command:
        
        ```bash
        python -m l2metrics.clear_ste
        ```
        
        ### Generating Metrics Report
        
        To generate a metrics plot and report with default settings, run the following command from the `l2metrics/examples` directory:
        
        ```bash
        python -m l2metrics -l ./multi_task -p performance
        ```
        
        The default output files are saved in the current working directory and defined below:
        
        - `multi_task_data.feather`: The log data DataFrame containing raw and preprocessed data.
        - `multi_task_metrics.json`: The lifetime and task-level metrics of the run.
        - `multi_task_settings.json`: The settings used to generate the metrics report.
        - `multi_task_block.png`: The block plot with separate subplots for evaluation blocks.
        - `multi_task_perf.png`: The performance plot.
        - `multi_task_ste.png`: The performance relative to STE plot.
        
        If you wish to generate a metrics report with modified settings (e.g., disabling normalization or aggregating lifetime metrics with the mean operator), you can either modify the arguments on the command line or specify a JSON file containing the desired settings. The settings loaded from the JSON file will take precedence over any arguments specified on the command line.
        
        ```bash
        python -m l2metrics -c settings.json
        ```
        
        Lastly, if you wish to compute metrics on multiple lifetimes at once, assert the recursive flag on the command line. When the recursive flag is set, L2Metrics will scan the subdirectories for valid LL logs, calculate metrics, then save out a TSV and JSON file containing lifetime/task-level metrics for each discovered lifetime.
        
        ```bash
        python -m l2metrics -l <path/to/directory/containing/multiple/runs> -R
        ```
        
        **Note**: If you do not wish to provide a fully qualified path to your log directory, you may copy it to your `$L2DATA/logs` directory. This is the default location for logs generated using the TEF.
        
        ### Log Data
        
        Refer to the [Log Data README](https://github.com/darpa-l2m/l2metrics/blob/main/docs/log_data.md) for more information on how to interface with the raw and preprocessed log data from the scenario.
        
        ### Output Settings File
        
        If saving of L2Metrics settings is enabled, the framework will generate a JSON file containing the primary parameters used to calculate L2Metrics:
        
        ```json
        {
          "log_dir": "multi_task",
          "perf_measure": "performance",
          "variant_mode": "aware",
          "ste_averaging_method": "metrics",
          "aggregation_method": "mean",
          "maintenance_method": "mrlep",
          "transfer_method": "ratio",
          "normalization_method": "task",
          "smoothing_method": "flat",
          "window_length": null,
          "clamp_outliers": false
        }
        ```
        
        ### Metrics and Metrics File
        
        The metrics module will print the lifetime metrics to the console when it has successfully completed execution. The following table shows an example of a metrics report output:
        
        | perf_recovery | perf_maintenance_mrlep | forward_transfer_ratio | backward_transfer_ratio | ste_rel_perf | sample_efficiency |
        | ------------- | ---------------------- | ---------------------- | ----------------------- | ------------ | ----------------- |
        | -2.0          | 3.86                   | 12.63                  | 1.08                    | 1.11         | 0.91              |
        
        If saving is enabled, the framework will also generate a JSON file containing lifetime and task-level metrics for the scenario. Please refer to the [File Description README](https://github.com/darpa-l2m/l2metrics/blob/main/docs/file_descriptions.md#metrics-json-file) for more information on the format of this file.
        
        ### Block Plot
        
        The resulting block plot from example run should look like this:
        
        ![diagram](examples/multi_task_block.png)
        
        The plot separates learning/training experiences from evaluation experiences. The top subplot shows the raw training data with a smoothed black curve overlaid. The subsequent subplots show the evaluation data for each individual task with 25% and 75% quantile ranges.
        
        ### Performance Plot
        
        The output figure of performance over experiences should look like this:
        
        ![diagram](examples/multi_task_perf.png)
        
        The white areas represent blocks in which learning is occurring while the gray areas represent evaluation blocks. The dashed lines in the plot show the slopes between each task's evaluation blocks.
        
        **Note**: The performance values shown in the evaluation blocks are an average over the whole block, resulting in a flat line for each task.
        
        ### Performance Relative to STE plot
        
        The framework should also produce a performance relative to STE plot shown below, where the task performance curves are generated by concatenating all the training data from the scenario:
        
        ![diagram](examples/multi_task_ste.png)
        
        The black dashed lines indicate the block boundaries where task performance was stitched together.
        
        ### Custom Metrics
        
        See documentation in the examples folder at [examples/README.md](./examples/README.md) for more details on how to implement custom metrics.
        
        ## Changelog
        
        See [CHANGELOG.md](./CHANGELOG.md) for a list of notable changes to the project.
        
        ## License
        
        See [LICENSE](LICENSE) for license information.
        
        ## Acknowledgements
        
        Primary development of Lifelong Learning Metrics (L2Metrics) was funded by the DARPA Lifelong Learning Machines (L2M) Program.
        
        Â© 2021-2022 The Johns Hopkins University Applied Physics Laboratory LLC
        
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Description-Content-Type: text/markdown
