Metadata-Version: 2.1
Name: pysparkcli
Version: 0.0.7
Summary: PySpark Project Buiding Tool
Home-page: https://github.com/qburst/PySparkCLI
Author: Jino Jossy
Author-email: jinojossy93@gmail.com
License: MIT
Platform: UNKNOWN
Description-Content-Type: text/markdown
Requires-Dist: click
Requires-Dist: jinja2
Requires-Dist: pathlib (>1.0.0)
Requires-Dist: pyspark

# PySpark CLI

This will implement a PySpark Project boiler plate code based on user input.

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

PySpark is the Python API for Spark.

## Installation Steps:

    git clone https://github.com/qburst/PySparkCLI.git

    cd PySparkCLI

    pip3 install -e . --user

## Create a PySpark Project

    pysparkcli create [PROJECT_NAME] --master [MASTER_URL] --cores [NUMBER]

    master - The URL of the cluster it connects to. You can also use -m instead of --master.
    cores - You can also use -c instead of --cores.

## Run a PySpark Project

    pysparkcli run [PROJECT_NAME]

## PySpark Project Test cases

   * Running by **Project name**

    pysparkcli test [PROJECT_NAME]
   * Running individual test case with filename: **test_etl_job.py**

    pysparkcli test [PROJECT_NAME] -t [etl_job]

## Contribution Guidelines

Check out [here](https://github.com/qburst/PySparkCLI/blob/master/CONTRIBUTING.md) for our contribution guidelines.

## Sponsors

[![QBurst](https://www.qburst.com/images/responsive/QBlogo.svg)](https://www.qburst.com)






