Metadata-Version: 2.2
Name: spark-rapids-user-tools
Version: 24.12.2
Summary: A simple wrapper process around cloud service providers to run tools for the RAPIDS Accelerator for Apache Spark.
Author-email: NVIDIA Corporation <spark-rapids-support@nvidia.com>
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Requires-Python: <3.13,>=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=2.0.0
Requires-Dist: chevron==0.14.0
Requires-Dist: fastprogress==1.0.3
Requires-Dist: fastcore==1.7.28
Requires-Dist: fire==0.7.0
Requires-Dist: pandas==2.2.3
Requires-Dist: pyYAML>=6.0.2
Requires-Dist: pyaml-env==1.2.1
Requires-Dist: tabulate==0.9.0
Requires-Dist: importlib-resources==6.5.0
Requires-Dist: requests==2.32.3
Requires-Dist: packaging==24.2
Requires-Dist: certifi==2024.12.14
Requires-Dist: urllib3==2.3.0
Requires-Dist: pygments==2.18.0
Requires-Dist: pydantic==2.10.4
Requires-Dist: pylint-pydantic==0.3.4
Requires-Dist: pyarrow==18.1.0
Requires-Dist: azure-storage-blob==12.24.0
Requires-Dist: adlfs==2024.12.0
Requires-Dist: progress==1.6
Requires-Dist: xgboost==2.1.3
Requires-Dist: shap==0.46.0
Requires-Dist: scikit-learn==1.5.2
Requires-Dist: psutil==6.1.1
Provides-Extra: test
Requires-Dist: tox; extra == "test"
Requires-Dist: pytest; extra == "test"
Requires-Dist: cli_test_helpers; extra == "test"
Requires-Dist: behave; extra == "test"
Requires-Dist: flake8-pydantic; extra == "test"
Requires-Dist: pylint==3.3.3; extra == "test"
Provides-Extra: qualx
Requires-Dist: holoviews; extra == "qualx"
Requires-Dist: matplotlib; extra == "qualx"
Requires-Dist: optuna; extra == "qualx"
Requires-Dist: optuna-integration; extra == "qualx"
Requires-Dist: seaborn; extra == "qualx"

# spark-rapids-user-tools

User tools to help with the adoption, installation, execution, and tuning of RAPIDS Accelerator for Apache Spark.

The wrapper improves end-user experience within the following dimensions:
1. **Qualification**: Educate the CPU customer on the cost savings and acceleration potential of RAPIDS Accelerator for
   Apache Spark. The output shows a list of apps recommended for RAPIDS Accelerator for Apache Spark with estimated savings
   and speed-up.
2. **Tuning**: Tune RAPIDS Accelerator for Apache Spark configs based on initial job run leveraging Spark event logs. The output
   shows recommended per-app RAPIDS Accelerator for Apache Spark config settings.
3. **Diagnostics**: Run diagnostic functions to validate the Dataproc with RAPIDS Accelerator for Apache Spark environment to
   make sure the cluster is healthy and ready for Spark jobs.
4. **Prediction**: Predict the speedup of running a Spark application with Spark RAPIDS on GPUs.
5. **Train**: Train a model to predict the performance of a Spark job on RAPIDS Accelerator for Apache Spark. The output shows
   the model file that can be used to predict the performance of a Spark job.


## Getting started

Set up a Python environment with a version between 3.9 and 3.12

1. Run the project in a virtual environment. Note, .venv is the directory created to put
   the virtual env in, so modify if you want a different location.
    ```sh
    $ python -m venv .venv
    $ source .venv/bin/activate
    ```
2. Install spark-rapids-user-tools
    - Using released package.

      ```sh
      $ pip install spark-rapids-user-tools
      ```
    - Install from source.

      ```sh
      $ pip install -e .
      ```

      Note:
      - To install dependencies required for running unit tests, use the optional `test` parameter: `pip install -e '.[test]'`
      - To install dependencies required for QualX training, use the optional `qualx` parameter `pip install -e '.[qualx]'`

    - Using wheel package built from the repo (see the build steps below).

      ```sh
      $ pip install <wheel-file>
      ```

3. Make sure to install CSP SDK if you plan to run the tool wrapper.

## Building from source

Set up a Python environment similar to the steps above.

1. Create a virtual environment. Note, .venv is the directory created to put
   the virtual env in, so modify if you want a different location.
    ```sh
    $ python -m venv .venv
    $ source .venv/bin/activate
    ```

2. Run the provided build script to compile the project.

   ```sh
   $> ./build.sh
   ```

3. **Fat Mode:** Similar to `fat jar` in Java, this mode solves the problem when web access is not
   available to download resources having Url-paths (http/https).  
   The command builds the tools jar file and downloads the necessary dependencies and packages them
   with the source code into a single 'wheel' file.

   ```sh
   $> ./build.sh fat
   ```

## Logging Configuration

The core tools project uses Log4j for logging. Default log level is set to INFO.
You can configure logging settings in the `log4j.properties` file located in the
`src/spark_rapids_pytools/resources/dev/` directory. This is applicable when
you clone the project and build it from source.
To change the logging level, modify the `log4j.rootLogger` property.
Possible levels include `DEBUG`, `INFO`, `WARN`, `ERROR`.

## Usage and supported platforms

Please refer to [spark-rapids-user-tools guide](https://github.com/NVIDIA/spark-rapids-tools/blob/main/user_tools/docs/index.md) for details on how to use the tools
and the platform.

Please refer to [qualx guide](https://github.com/NVIDIA/spark-rapids-tools/blob/main/user_tools/docs/qualx.md) for details on how to use the QualX tool for prediction and training.

## What's new

Please refer to [CHANGELOG.md](https://github.com/NVIDIA/spark-rapids-tools/blob/main/CHANGELOG.md) for our latest changes.
