Metadata-Version: 2.1
Name: hasami
Version: 0.0.1
Summary: Sentence segmentation for japanese text
Home-page: https://github.com/mkartawijaya/hasami
Author: Martin Kartawijaya
Author-email: pypi@m.kartawijaya.dev
License: BSD-3-Clause
Keywords: japanese,sentence,segmentation,nlp,sbd
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Classifier: License :: OSI Approved :: BSD License
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Natural Language :: Japanese
Requires-Python: >=3.6
Description-Content-Type: text/markdown

# Hasami

Hasami is a tool to perform sentence segmentation on japanese text.

* In addition to simply splitting on sentence-ending markers like `！？。` 
  it will treat runs of sentence-ending characters as a single sentence ending.
* It will not split enclosed sentences, i.e. those in quotes or parentheses.
* It can be configured with custom sentence-ending markers and enclosures 
  in case the defaults don't cover your needs.
* You can define exceptions for when not to split sentences.

## Installation

```bash
pip install hasami
```

## Usage

A simple command line interface is provided to use the functionality
without having to write your own script. Input is read from `stdin` or from a file.
```bash
$ echo "これが最初の文。これは二番目の文。これが最後の文。" | tee input.txt | hasami
これが最初の文。
これは二番目の文。
これが最後の文。

$ hasami input.txt
これが最初の文。
これは二番目の文。
これが最後の文。
```

To use in your code: 

```python
import hasami

hasami.segment_sentences('これが最初の文。これは二番目の文。これが最後の文。')
# => ['これが最初の文。', 'これは二番目の文。', 'これが最後の文。']
```

*More complex examples will follow soon, please refer to the test cases in the meantime.* 

## License

Licensed under the BSD-3-Clause License

