Metadata-Version: 2.1
Name: metrohash
Version: 0.2.0
Summary: Python bindings for MetroHash, a fast non-cryptographic hash algorithm
Home-page: https://github.com/escherba/python-metrohash
Author: Eugene Scherba
Author-email: escherba+metrohash@gmail.com
License: Apache License 2.0
Download-URL: https://github.com/escherba/python-metrohash/tarball/master/0.2.0
Keywords: hash,hashing,metrohash
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: C++
Classifier: Programming Language :: Cython
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: System :: Distributed Computing
Description-Content-Type: text/markdown
License-File: LICENSE

# MetroHash

Python wrapper for [MetroHash](https://github.com/jandrewrogers/MetroHash), a
fast non-cryptographic hash function.

[![Latest
Version](https://img.shields.io/pypi/v/metrohash.svg)](https://pypi.python.org/pypi/metrohash)
[![Downloads](https://img.shields.io/pypi/dm/metrohash.svg)](https://pypi.python.org/pypi/metrohash)
[![Tests
Status](https://circleci.com/gh/escherba/python-metrohash.png?style=shield)](https://circleci.com/gh/escherba/python-metrohash)
[![Supported Python
versions](https://img.shields.io/pypi/pyversions/cityhash.svg)](https://pypi.python.org/pypi/cityhash)
[![License](https://img.shields.io/pypi/l/cityhash.svg)](https://pypi.python.org/pypi/cityhash)

## Getting Started

To use this package in your program, simply type

``` bash
pip install metrohash
```

After that, you should be able to import the module and do things with
it (see usage example below).

## Usage Examples

### Stateless hashing

This package provides Python interfaces to 64- and 128-bit
implementations of MetroHash algorithm. For stateless hashing, it
exports `metrohash64` and `metrohash128` functions. Both take a value to
be hashed and an optional `seed` parameter:

``` python
>>> import metrohash
...
>>> metrohash.hash64_int("abc", seed=0)
17099979927131455419
>>> metrohash.hash128_int("abc")
182995299641628952910564950850867298725

```

### Incremental hashing

Unlike its cousins CityHash and FarmHash, MetroHash allows incremental
(stateful) hashing. For incremental hashing, use `MetroHash64` and
`MetroHash128` classes. Incremental hashing is associative and
guarantees that any combination of input slices will result in the same
final hash value. This is useful for processing large inputs and stream
data. Example with two slices:

``` python
>>> mh = metrohash.MetroHash64()
>>> mh.update("Nobody inspects")
>>> mh.update(" the spammish repetition")
>>> mh.intdigest()
7851180100622203313

```

The resulting hash value above should be the same as in:

``` python
>>> mh = metrohash.MetroHash64()
>>> mh.update("Nobody inspects the spammish repetition")
>>> mh.intdigest()
7851180100622203313

```

### Fast hashing of NumPy arrays

The Python [Buffer
Protocol](https://docs.python.org/3/c-api/buffer.html) allows Python
objects to expose their data as raw byte arrays to other objects, for
fast access without copying to a separate location in memory. Among
others, NumPy is a major framework that supports this protocol.

All hashing functions in this packege will read byte arrays from objects
that expose them via the buffer protocol. Here is an example showing
hashing of a 4D NumPy array:

``` python
>>> import numpy as np
>>> arr = np.zeros((256, 256, 4))
>>> metrohash.hash64_int(arr)
12125832280816116063

```

The arrays need to be contiguous for this to work. To convert a
non-contiguous array, use NumPy's `ascontiguousarray()` function.

## Development

### Local workflow

For those who want to contribute, here is a quick start using some
makefile commands:

``` bash
git clone https://github.com/escherba/python-metrohash.git
cd python-metrohash
make env           # create a Python virtualenv
make test          # run Python tests
make cpp-test      # run C++ tests
make shell         # enter IPython shell
```

The Makefiles provided have self-documenting targets. To find out which
targets are available, type:

``` bash
make help
```

### Distribution

The wheels are built using
[cibuildwheel](https://cibuildwheel.readthedocs.io/) and are distributed
to PyPI using GitHub actions using [this
workflow](.github/workflows/publish.yml). The wheels contain compiled
binaries and are available for the following platforms: windows-amd64,
ubuntu-x86, linux-x86\_64, linux-aarch64, and macosx-x86\_64.

## See Also

For other fast non-cryptographic hash functions available as Python
extensions, see [FarmHash](https://github.com/escherba/python-cityhash)
and [MurmurHash](https://github.com/hajimes/mmh3).

## Authors

The MetroHash algorithm and C++ implementation is due to J. Andrew
Rogers. The Python bindings for it were written by Eugene Scherba.

## License

This software is licensed under the [Apache License,
Version 2.0](https://opensource.org/licenses/Apache-2.0). See the
included LICENSE file for details.


