Metadata-Version: 2.1
Name: faker-pyspark
Version: 0.8.0
Summary: faker-pyspark is a PySpark DataFrame and Schema provider for the Faker python package
Home-page: https://github.com/spsoni/faker-pyspark
License: MIT
Keywords: Faker, PySpark
Author: Sury Soni
Author-email: github@suryasoni.info
Requires-Python: >=3.8.1,<4.0.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Project-URL: Repository, https://github.com/spsoni/faker-pyspark
Description-Content-Type: text/markdown


# PySpark provider for Faker

[![Python package](https://github.com/spsoni/faker_pyspark/actions/workflows/python-package.yml/badge.svg)](https://github.com/spsoni/faker_pyspark/actions/workflows/python-package.yml)
[![CodeQL](https://github.com/spsoni/faker-pyspark/actions/workflows/codeql.yml/badge.svg)](https://github.com/spsoni/faker-pyspark/actions/workflows/codeql.yml)

`faker-pyspark` is a PySpark DataFrame and Schema (StructType) provider for the `Faker` Python package.


## Description

`faker-pyspark` provides PySpark based fake data for testing purposes.  The definition of "fake" in this context really means "random," as the data may look real.  However, I make no claims about accuracy, so do not use this as real data!


## Installation

Install with pip:

``` bash
pip install faker-pyspark

```

Add as a provider to your Faker instance:

``` python

from faker import Faker
from faker_pyspark import PySparkProvider
fake = Faker()
fake.add_provider(PySparkProvider)

```

### PySpark DataFrame, Schema and more

``` python
>>> df           = fake.pyspark_dataframe()
>>> schema       = fake.pyspark_schema()
>>> df_updated   = fake.pyspark_update_dataframe(df)
>>> column_names = fake.pyspark_column_names()
>>> data         = fake.pyspark_data_dict_using_schema(schema)
>>> data         = fake.pyspark_data_dict()

```

### CLI `faker`

```bash
$ faker pyspark_schema       -i faker_pyspark
$ faker pyspark_dataframe    -i faker_pyspark
$ faker pyspark_schema       -i faker_pyspark
$ faker pyspark_column_names -i faker_pyspark
$ faker pyspark_data_dict    -i faker_pyspark
```

