Metadata-Version: 2.1
Name: lexikanon
Version: 0.1.3
Summary: A Python Library for Tokenizers
Home-page: https://lexikanon.entelecheia.ai
License: MIT
Author: Young Joon Lee
Author-email: entelecheia@hotmail.com
Requires-Python: >=3.8.1,<3.12
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: hyfi (>=0.2.20,<0.3.0)
Project-URL: Repository, https://github.com/entelecheia/lexikanon
Description-Content-Type: text/markdown

# Lexikanon: A Python Library for Tokenizers

[![pypi-image]][pypi-url]
[![license-image]][license-url]
[![version-image]][release-url]
[![release-date-image]][release-url]
[![jupyter-book-image]][docs-url]

<!-- Links: -->

[pypi-image]: https://img.shields.io/pypi/v/lexikanon
[license-image]: https://img.shields.io/github/license/entelecheia/lexikanon
[license-url]: https://github.com/entelecheia/lexikanon/blob/main/LICENSE
[version-image]: https://img.shields.io/github/v/release/entelecheia/lexikanon?sort=semver
[release-date-image]: https://img.shields.io/github/release-date/entelecheia/lexikanon
[release-url]: https://github.com/entelecheia/lexikanon/releases
[jupyter-book-image]: https://jupyterbook.org/en/stable/_images/badge.svg
[repo-url]: https://github.com/entelecheia/lexikanon
[pypi-url]: https://pypi.org/project/lexikanon
[docs-url]: https://lexikanon.entelecheia.ai
[changelog]: https://github.com/entelecheia/lexikanon/blob/main/CHANGELOG.md
[contributing guidelines]: https://github.com/entelecheia/lexikanon/blob/main/CONTRIBUTING.md

<!-- Links: -->

- Documentation: [https://lexikanon.entelecheia.ai][docs-url]
- GitHub: [https://github.com/entelecheia/lexikanon][repo-url]
- PyPI: [https://pypi.org/project/lexikanon][pypi-url]

Lexikanon is a robust and efficient Python library designed for creating, training, and deploying tokenizers, an essential component in natural language processing (NLP) and artificial intelligence (AI) applications. The name Lexikanon originates from the Greek words λέξη (word) and κάνων (maker), reflecting the library's purpose in enabling users to build powerful tokenizers for various languages and tasks.

## Features

Lexikanon offers an extensive set of features, making it suitable for both newcomers and experienced professionals in the NLP domain:

- **Intuitive API**: Lexikanon's easy-to-use API allows users to create, train, and utilize tokenizers with just a few lines of code, ensuring a seamless experience.

- **Wide range of tokenization techniques**: The library supports various tokenization methods, including rule-based, statistical, and subword tokenization, catering to diverse requirements and use cases.

- **Multilingual support**: Lexikanon is designed with a focus on multilingualism, providing support for a broad range of languages and seamless integration with other language resources and tools.

- **Customizability**: Users can build custom tokenizers from the ground up or modify existing ones, offering complete control over tokenization rules, training data, and output formats.

- **Efficient processing**: Lexikanon utilizes advanced algorithms and data structures to ensure high-performance tokenization, even on large-scale text corpora.

- **Pre-trained tokenizers**: The library includes a collection of pre-trained tokenizers for various languages and domains, enabling users to take advantage of transfer learning and quickly adapt these tokenizers to their specific needs.

## Installation

You can install Lexikanon using pip:

```bash
pip install lexikanon
```

## Getting Started

To begin working with Lexikanon, visit the [official documentation](https://lexikanon.entelecheia.ai/) and the [GitHub repository](https://github.com/entelecheia/lexikanon) for examples, tutorials, and additional information.

## Changelog

See the [CHANGELOG] for more information.

## Contributing

Contributions are welcome! Please see the [contributing guidelines] for more information.

## License

This project is released under the [MIT License][license-url].

