Metadata-Version: 2.1
Name: spark-rapids-user-tools
Version: 23.10.1
Summary: A simple wrapper process around cloud service providers to run tools for the RAPIDS Accelerator for Apache Spark.
Author-email: NVIDIA Corporation <spark-rapids-support@nvidia.com>
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: chevron ==0.14.0
Requires-Dist: fastprogress ==1.0.3
Requires-Dist: fastcore ==1.5.29
Requires-Dist: fire >=0.5.0
Requires-Dist: pandas ==1.4.3
Requires-Dist: pyYAML ==6.0
Requires-Dist: tabulate ==0.8.10
Requires-Dist: importlib-resources ==5.10.2
Requires-Dist: requests ==2.31.0
Requires-Dist: packaging ==23.0
Requires-Dist: certifi ==2023.7.22
Requires-Dist: idna ==3.4
Requires-Dist: urllib3 ==1.26.14
Requires-Dist: beautifulsoup4 ==4.11.2
Requires-Dist: pygments ==2.15.0
Requires-Dist: pydantic ==2.1.1
Requires-Dist: pylint-pydantic ==0.3.0
Requires-Dist: pyarrow ==12.0.1
Requires-Dist: azure-storage-blob ==12.17.0
Requires-Dist: adlfs ==2023.4.0
Requires-Dist: progress ==1.6
Provides-Extra: test
Requires-Dist: tox ; extra == 'test'
Requires-Dist: pytest ; extra == 'test'
Requires-Dist: cli-test-helpers ; extra == 'test'

# spark-rapids-user-tools

User tools to help with the adoption, installation, execution, and tuning of RAPIDS Accelerator for Apache Spark.

The wrapper improves end-user experience within the following dimensions:
1. **Qualification**: Educate the CPU customer on the cost savings and acceleration potential of RAPIDS Accelerator for
   Apache Spark. The output shows a list of apps recommended for RAPIDS Accelerator for Apache Spark with estimated savings
   and speed-up.
2. **Bootstrap**: Provide optimized RAPIDS Accelerator for Apache Spark configs based on GPU cluster shape. The output
   shows updated Spark config settings on driver node.
3. **Tuning**: Tune RAPIDS Accelerator for Apache Spark configs based on initial job run leveraging Spark event logs. The output
   shows recommended per-app RAPIDS Accelerator for Apache Spark config settings.
4. **Diagnostics**: Run diagnostic functions to validate the Dataproc with RAPIDS Accelerator for Apache Spark environment to
   make sure the cluster is healthy and ready for Spark jobs.


## Getting started

Set up a Python environment with a version between 3.8 and 3.10

1. Run the project in a virtual environment.
    ```sh
    $ python -m venv .venv
    $ source .venv/bin/activate
    ```
2. Install spark-rapids-user-tools 
    - Using released package.
      
      ```sh
      $ pip install spark-rapids-user-tools
      ```
    - Install from source.

      ```sh
      $ pip install -e .
      ```
      
      Note that you can also use optional `test` to install dependencies required to run the unit-tests
      `pip install -e '.[test]'`

    - Using wheel package built from the repo (see the build steps below).

      ```sh
      $ pip install <wheel-file>
      ```

3. Make sure to install CSP SDK if you plan to run the tool wrapper.

## Building from source

Set up a Python environment similar to the steps above.

1. Run the provided build script to compile the project.

   ```sh
   $> ./build.sh
   ```
 
2. **Fat Mode:** Similar to `fat jar` in Java, this mode solves the problem when web access is not
   available to download resources having Url-paths (http/https).  
   The command builds the tools jar file and downloads the necessary dependencies and packages them
   with the source code into a single 'wheel' file.

   ```sh
   $> ./build.sh fat
   ```
 
## Usage and supported platforms

Please refer to [spark-rapids-user-tools guide](https://github.com/NVIDIA/spark-rapids-tools/blob/main/user_tools/docs/index.md) for details on how to use the tools
and the platform.

## What's new

Please refer to [CHANGELOG.md](https://github.com/NVIDIA/spark-rapids-tools/blob/main/CHANGELOG.md) for our latest changes.
