Metadata-Version: 2.1
Name: porter2
Version: 0.1.1
Summary: A python wrapper around surgebase's Go implimentation of the Porter2 stemmer
Home-page: https://github.com/kajuberdut/porter2
Author: Patrick Shechet
Author-email: patrick.shechet@gmail.com
License: Apache License 2.0
Platform: UNKNOWN
Classifier: Natural Language :: English
Classifier: Topic :: Text Processing :: Linguistic
Description-Content-Type: text/markdown

# porter2
A python wrapper around surgebase's porter2 implimentation. 

The surgebase library impliments a finate state machine and as such has very good performance.
For more information see [Credits](#credits-and-info)

## Usage
```python
from porter2 import stem

print(stem("running"))
# "run"
```

## What does the python library do?
In order to make this wrapper library I took the porter2.go file from Surgebase, made a few small modifications including
making the stem function accept and return C strings and renaming the package to main.
I then compiled that file to C, and provided a [small wrapper](https://github.com/kajuberdut/porter2/blob/main/porter2/__init__.py) to call the c version.

Becuase the C version of porter2.go is 2.4 MB, I went ahead and compressed it using Python's built in bz2 and added the [zipper module](https://github.com/kajuberdut/porter2/blob/main/porter2/surgebase/zipper.py) to unzip it
on first use. The bz2 version is about 0.8 MB. As a result of the decompression step, the very first time porter2 is imported it will take a fraction
of a second longer than future imports.

## Alternatives for Porter2 in Python
* [pystemmer](https://github.com/snowballstem/pystemmer)
* [A Python version](https://github.com/evandempsey/porter2-stemmer)
* [Another Python version from Whoosh](https://github.com/mchaput/whoosh/blob/main/src/whoosh/lang/porter2.py)

## Credits and Info
I'm not clear on the relationship between surgebase and Jian Zhen but zhenjl appears to deserve authorship credit and surgebase is where I found the go implimentation.
* [Martin Porter](https://en.wikipedia.org/wiki/Martin_Porter)
* [SurgeBase for go source](https://github.com/surgebase/porter2)
* [Jian Zhen](https://github.com/zhenjl)

## Additional Info
* [Jian Zhen's blog post](https://zhen.org/blog/generating-porter2-fsm-for-fun-and-performance/)
* [Porter 2 Wikipedia Entry](https://en.wikipedia.org/wiki/Stemming)
* [History of the Porter Stemmer](https://tartarus.org/martin/PorterStemmer/)


