Metadata-Version: 2.1
Name: jptranstokenizer
Version: 0.0.4
Summary: Japanese tokenizer with transformers library
License: MIT
Author: Masahiro Suzuki
Author-email: msuzuki9609@gmail.com
Requires-Python: >=3.8,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Requires-Dist: SudachiTra (>=0.1.7,<0.2.0)
Requires-Dist: pyknp (>=0.6.1,<0.7.0)
Requires-Dist: sentencepiece (>=0.1.96,<0.2.0)
Requires-Dist: spacy (>=3.2.0,<4.0.0)
Requires-Dist: transformers (>=4.7.0,<5.0.0)
Description-Content-Type: text/markdown

<div id="top"></div>

<h1 align="center">jptranstokenizer: Japanese Tokenzier for transformers</h1>

<p align="center">
  <img alt="Python" src="https://img.shields.io/badge/python-3.7%20%7C%203.8%20%7C%203.9%20%7C%203.10-blue">
  <a href="https://pypi.python.org/pypi/jptranstokenizer">
    <img alt="pypi" src="https://img.shields.io/pypi/v/jptranstokenizer.svg">
  </a>
  <a href="https://github.com/retarfi/jptranstokenizer#licenses">
    <img alt="License" src="https://img.shields.io/badge/license-Apache--2.0-brightgreen">
  </a>
  <a href="https://github.com/retarfi/jptranstokenizer/releases">
    <img alt="GitHub release" src="https://img.shields.io/github/v/release/retarfi/jptranstokenizer.svg">
  </a>
</p>

This is a repository for japanese tokenizer with HuggingFace library.

**issue は日本語でも大丈夫です。**


<!-- TABLE OF CONTENTS -->
<details>
  <summary>Table of Contents</summary>
  <ol>
    <li>
      <a href="#usage">Usage</a>
    </li>
    <li><a href="#roadmap">Roadmap</a></li>
    <li>
      <a href="#citation">Citation</a>
      <ul>
        <li><a href="#this-implementation">This Implementation</a></li>
      </ul>
    </li>
    <li><a href="#licenses">Licenses</a></li>
    <li><a href="#related-work">Related Work</a></li>
  </ol>
</details>


## Usage

To be added


<!-- ROADMAP -->

## Roadmap


See the [open issues](https://github.com/retarfi/language-pretraining/issues) for a full list of proposed features (and known issues).

## Citation


**There will be another paper for this pretrained model.
Be sure to check here again when you cite.**

### This Implementation

```
@misc{suzuki-2022-github,
  author = {Masahiro Suzuki},
  title = {jptranstokenizer: Japanese Tokenzier for transformers},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/retarfi/jptranstokenizer}}}
```

## Licenses

The codes in this repository are distributed under the [Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0).


## Related Work
- Pretrained Japanese BERT models (containing Japanese tokenizer)
  - Autor NLP Lab. in Tohoku University
  - https://github.com/cl-tohoku/bert-japanese
- SudachiTra
  - Author Works Applications
  - https://github.com/WorksApplications/SudachiTra
- UD_Japanese-GSD
  - Author megagonlabs
  - https://github.com/megagonlabs/UD_Japanese-GSD
- Juman++
  - Author Kurohashi Lab. in Universyti of Kyoto
  - https://github.com/ku-nlp/jumanpp

