Metadata-Version: 2.4
Name: h3spark
Version: 0.1.1
Summary: Lightweight pyspark wrapper for h3-py
Author-email: Joseph Chotard <h3spark@chotard.com>
License: MIT License
License-File: LICENSE
Keywords: h3,h3-py,pyspark,spark
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.9
Requires-Dist: h3<4.3,>=4.1
Requires-Dist: pyspark<4.0,>=3.0
Description-Content-Type: text/markdown

# h3spark

![Tile the world in hexes](images/big_geo.jpeg "Tile the world in hexes")

`h3spark` is a Python library that provides a set of user-defined functions (UDFs) for working with H3 geospatial indexing in PySpark. The functions in this library follow the same assumptions and rules as the native H3 functions, allowing for seamless integration and usage in PySpark data pipelines.

## Installation

You can install `h3spark` using either pip or conda.

### Using pip
```bash
pip install h3spark
```

### Using conda
```bash
conda install -c conda-forge h3spark
```

## Usage

Below is a brief overview of the available functions in `h3spark`. These functions are designed to work with PySpark DataFrames and provide H3 functionality within a distributed data processing environment.

### Functions

`H3CellInput` is a type alias that represents an H3 cell, which can be either a hexadecimal string or a long integer (`H3CellInput = Union[str, int]`). h3spark will handle conversion from between types if required by h3. Prefer long integers if possible for more efficient processing.

- **`str_to_int(h3_str: string) -> long`**: Converts an H3 string to an integer.
- **`int_to_str(h3_int: Union[str, int]) -> string`**: Converts an H3 integer to a string. Allows strings due to Spark's limitation with unsigned 64-bit integers.
- **`get_num_cells(res: int) -> int`**: Returns the number of H3 cells at a given resolution.
- **`average_hexagon_area(res: int, unit: Union[AreaUnit, str] = AreaUnit.KM2) -> float`**: Calculates the average area of an H3 hexagon at a given resolution and unit.
- **`average_hexagon_edge_length(res: int, unit: Union[LengthUnit, str] = LengthUnit.KM) -> float`**: Computes the average edge length of an H3 hexagon at a specified resolution and unit.
- **`latlng_to_cell(lat: float, lng: float, res: int) -> long`**: Converts latitude and longitude to an H3 cell at a specified resolution.
- **`cell_to_latlng(cell: H3CellInput) -> COORDINATE_TYPE`**: Converts an H3 cell to its central latitude and longitude.
- **`get_resolution(cell: H3CellInput) -> short`**: Retrieves the resolution of a given H3 cell.
- **`cell_to_parent(cell: H3CellInput, res: int) -> long`**: Converts an H3 cell to its parent cell at a specified resolution.
- **`grid_distance(cell1: H3CellInput, cell2: H3CellInput) -> int`**: Calculates the distance in grid cells between two H3 cells.
- **`cell_to_boundary(cell: H3CellInput) -> BOUNDARY_TYPE`**: Returns the boundary of an H3 cell as a list of coordinates.
- **`grid_disk(cell: H3CellInput, k: int) -> List[long]`**: Returns all cells within k rings around the given H3 cell.
- **`grid_ring(cell: H3CellInput, k: int) -> List[long]`**: Returns cells in a ring of k distance from the given H3 cell.
- **`cell_to_children_size(cell: H3CellInput, res: int) -> int`**: Returns the number of children cells for a given cell at a specified resolution.
- **`cell_to_children(cell: H3CellInput, res: int) -> List[long]`**: Returns the children of an H3 cell at a specified resolution.
- **`cell_to_child_pos(child: H3CellInput, res_parent: int) -> int`**: Finds the position of a child cell relative to its parent cell at a specified resolution.
- **`child_pos_to_cell(parent: H3CellInput, res_child: int, child_pos: int) -> long`**: Converts a child position back to an H3 cell.
- **`compact_cells(cells: List[H3CellInput]) -> List[long]`**: Compacts a list of H3 cells.
- **`uncompact_cells(cells: List[H3CellInput], res: int) -> List[long]`**: Uncompacts a list of H3 cells to a specified resolution.
- **`h3shape_to_cells(shape: H3Shape, res: int) -> List[long]`**: Converts a shape to H3 cells at a specified resolution.
- **`cells_to_h3shape(cells: List[H3CellInput]) -> string`**: Converts a list of H3 cells to a GeoJSON shape.
- **`is_pentagon(cell: H3CellInput) -> bool`**: Checks if an H3 cell is a pentagon.
- **`get_base_cell_number(cell: H3CellInput) -> int`**: Retrieves the base cell number of an H3 cell.
- **`are_neighbor_cells(cell1: H3CellInput, cell2: H3CellInput) -> bool`**: Checks if two H3 cells are neighbors.
- **`grid_path_cells(start: H3CellInput, end: H3CellInput) -> List[long]`**: Finds the grid path between two H3 cells.
- **`is_res_class_III(cell: H3CellInput) -> bool`**: Checks if an H3 cell is of class III resolution.
- **`get_pentagons(res: int) -> List[long]`**: Returns all pentagon cells at a given resolution.
- **`get_res0_cells() -> List[long]`**: Returns all resolution 0 base cells.
- **`cell_to_center_child(cell: H3CellInput, res: int) -> long`**: Finds the center child cell of a given cell at a specified resolution.
- **`get_icosahedron_faces(cell: H3CellInput) -> List[int]`**: Retrieves icosahedron face indexes for a given H3 cell.
- **`cell_to_local_ij(cell: H3CellInput) -> List[int]`**: Converts an H3 cell to local IJ coordinates.
- **`local_ij_to_cell(origin: H3CellInput, i: int, j: int) -> long`**: Converts local IJ coordinates back to an H3 cell.
- **`cell_area(cell: H3CellInput, unit: Union[AreaUnit, str] = AreaUnit.KM2) -> float`**: Computes the area of an H3 cell in a specified unit.


### Spark native Functions

Some H3 functions can ~mostly be reimplemented purely within pyspark. Doing so avoids the serialization/deserialization overhead of a UDF. These functions should be mostly equivalent to their C native counterparts while being more performant in pyspark. You can import them from `h3spark.native`

- **`get_resolution(cell: long) -> long`**: Retrieves the resolution of a given H3 cell.
- **`cell_to_parent_fixed(cell: long, current_resolution: int, parent_resolution: int) -> long`**: Given a column where every row has the same resolution (current_resolution), call `cell_to_parent` on every row to the same constant resolution (parent_resolution). Does not perform any validation on the input cells


### Convenience functions

We provide some functions that wrap other h3 functions for streamlining commonly used operations. You can import them from `h3spark.convenience`

- **`min_child(cell: H3CellInput, resolution: int) -> long`**: Finds the child of minimum value of the input H3 cell at the specified resolution
- **`max_child(cell: H3CellInput, resolution: int) -> long`**: Finds the child of maximum value of the input H3 cell at the specified resolution

## License

This library is released under the MIT License. See the [LICENSE](LICENSE) file for more details.

## Contributing

Contributions are welcome! Please submit a pull request or open an issue to contribute to the project.

## Acknowledgments

This library is built on top of the H3 geospatial indexing library and PySpark. Special thanks to the developers of these libraries for their contributions to the open-source community.

For more information, check the [official H3 documentation](https://h3geo.org/docs/) and [PySpark documentation](https://spark.apache.org/docs/latest/api/python/index.html).

## Building + Deploying

```sh
python -m build
python -m twine upload --verbose --repository pypi dist/*
```