Metadata-Version: 2.4
Name: ailaysa
Version: 0.1.0
Summary: Ailaysa: An Indic NLP Toolkit featuring high-performance tokenization and future AI language tools.
Author-email: Mukesh Anand G <ai.mukeshanandg@gmail.com>, Ailaysa Technologies <community@ailaysa.net>
License: MIT
Project-URL: Homepage, https://github.com/Ailaysa-Technologies
Project-URL: Repository, https://github.com/Ailaysa-Technologies/Asai-Tokenizer
Keywords: tamil,nlp,tokenizer,ai,language,machine-learning
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Developers
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Requires-Dist: tokenizers
Requires-Dist: sentencepiece

<p align="center">
  <img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAOEAAADhCAMAAAAJbSJIAAAAkFBMVEUGc/f///8Aa/cAbfcAcPcAbve60vzb6/6vy/y5zvwAafenyfyz0fyAr/oAZ/YAdPcqgfifwvsGd/dFjfjl8P6VvPt5qfrN4P3v9v7Y5/2Rufv5/P/4+v+evvtTl/k4ivhNkvljn/nq8/6HsvoogvjP4v15qvofffjD2v1ak/g4hvikw/ttpPlfnPnX6v681fxpA96KAAAJwElEQVR4nO2daWOqPBOGIQv2RNlRcF8rHtvz9v//uxfcKiQBFAgmj/dna3OZZJJMZiaaXlGz8WA+WixjT+tWnrNc7OeD3qxqw7UqHwoM08cWgBihjvlSIYQhsIBvGptmCAPX1wh+BbSsECb2yu3VJjRWELwe3VUIwJVRh3A2scnr4p2FiD2MniSMhh7ouv2VBLRtASOf0JWELxXw3IcJw1gevlRkFz5EOB1Zrz7/8kLWaFqdMHTk6sCzgMPsRhahC2XrwLMQZM1GBuHa6rqpT8taVyCMfBlH6FVgRa0becLAwV23spawExQTbjy5ARNEb1NEGHhy2ph7IS/gE0aSD9GzsBNxCVcqACaIPo9wLbMVvRdYswldedfBvCyXRRjCrtvVoGBIE04d+c3or5AzpQhHqkzCs8AoTxiqMwnPssIc4U6lMZoKxVlCl3TdosYF3HvCTdee7DbkRXeEW7XMzFlg8ksYdd2YduTNboRDFbvw2oknQrvrtrQk70poqGdIzyLGhXCl2lp4FVqdCXsqbbmzgsGJ0FXTzqRKV31N4UGaDFM/JdyoaklPChJCZS1pqsSaarqphvuJLWwmhL660/A0EbWZupY0FZ5pY9UO91mRsTZQuw/BQJuru6NJBefaSGVTmszDkbZQ2ZQmxnShLRUnXGpO121oWbGmopftXqrzvfXWW2+99dZb9QTIQwIA4roJKfj6P4UcaeOB8ZD67sT8/nuMIQFPZqcA79u9fJkr4rRgV8rIoRQFB/f7SMjDh1KE77IqhNxDkM+nCC+cP3uPPBRqjZ379B8hhHBSgzDVpxlXz8dBTuZvhRCeLq7qaTpYVmVE2QSuTyGmBpRnjZUrPFZiJF/ZP/sRQgjNBgh1feCVjzh0zOWKTMT4B0lhxlhlTfelN0Mg14W6IN9SQ52o64eyjQDKrUw9Uc4zWGfBuFfgFI46tMt9fiLKTY+cytnGJZouitqM99lPC7zShYuGCHV9UdCL+Wm4FngPARpDnC75+zjwkfnoh9DLMkaa0ZPaxFzzkSUUsxb+CmuDhhD5gfQZwg5CY6zj4LljRl4T3rp4RxiuugirQMBbmV9/imUcPkt/B16yAN4bp+/4+t6Brq4CEQTlImC3L+zuMa9/8OUbXrCwQ04IE+K7+fzAX30rcVmLAPoe8zpRCUItHXQmO69c/6cKogY89pZWzMlWiBDsMxFVymshQxahUqGuhLUZipQK0IIsk6pUGCF1sE2lVuYHayp+qhXOi+mjV1QhOAS9RMWxSgIMv3lJWD0CxNsdj4IaWFt3ybo3FcZkI3IcjjdT/X/S7H3IgSLsF5gasPs5f0ieHJi8Fy1RyCVE8GaZ5CFEO2qYcr29yP7tcHkIGTcDPGOK7LsNgkyEP9Qw5fjc4H2VK4kIwZwiPDIJrcxZRCJCPKIImQsizt7HSkTIuElmnvNJT1pCevfNckfB3KoiEaEWVyLEPXkJvSqE1FiWm5CxMQV5p47khHTr84O0K0KEMIZn9371gD5UhZD6GUQTIgQBsUjs/xttJ+5Xvz94QBUI6SVFJGECh+LF5M9nUzelDEJMBX0II0QA+eYP/6alIUIqnEYUISCLfsN0HMKP/GdEECKiFVfRlpwQgSN97FGJEMRNhSe8KKFlNhUL9ZqE2OaVBleEEPrtGJiXIYQrzgW1KoRw2TJf14Q4H46sHKHWRPT6KxMS6l8pRoh8VotUIrTK9tnTKOqFh8Gg/4BeiRCvC+CCcPi9cjTrkolXWdYrEVrc0PzxdmmnSXdPXD7jFyJEvKWwH4PnX/xgEFK+NlGEgBnEpH94tV6Eoglpf6koQswK8IlWNV+8on1tnREi2v/exGsYVXzegghZlnRT+zUMxuSm754EETLuMht4LILxu9H3h4IIGffRg/qBhIxcW/oOWBQhbWjY99EPiW484x5fEKFFHe2DBmJBLeqwwojFEEVI/dh/GgiUjKvE03RG2EAyLsPQhHT4paj1kGrLvn74HMN8MeLaJCZEMe23Y3yrKFtKtWVb+9+wcqUZ4TQSWxpAH6lZYW3yrhZgS3chq15sdyt+zQQXVoQwM1ZfFCEdzltz1wZZ9x+sqDZZd96ElfnEzJkRdXr6R7enzluCFtNl8MWyXl2egMf2k72IWENC52zmO/VibPynclwA5w5yzFyAOvZEufbDT69jsOdc8KyZQ4ImHAv1Js6Gx0eKWiGAv3n3O5w8YJqQn7JQS4CXoDwNJ0tSpQwbwoAc5w/nctOETRzcGGJZ05uiw3x9xIX+fAKWo4+ifHxeFjBN2FYVJYvXib+cwefhx3i2pgLPK0ITtvWqCq5fwKxIwwp1Mc6K4nYAOam7TemTu7RShG0mtbcQqXdRVLU+jc4Msm1KeNlapEL1GkP6rM0i5LCtqVhUCitP+NFqrnBzlaEqA1KELdeRBy1EfW2K67XlCFtPaMdx05F7B7t4ecsSBu0nciO4bdLeTM2yrXuWUMjr6MAxGgM04tJBlyHciym6gMiymSjo0LfKB9094VZYzQVEjm7dImazQbUogDtCU2TtEwTsxUcNuxpWriN8I9wULiptCAOwmxyeCPp+rBb0lbCPuyilgCCxduvhz3hTrTujIHRHS+uhet4nwsB1uit7kh7dieYs/67322G/z67J/uFOzPXfZYwfr8mOza+9jx92BzUudMrO46bmAQDhs3X1sQTV9t5666233nrrrY6l+vuAntaah/xF5Cj/DunxP/CWrOrvAe//A286q/8ut+pvq/e0mdrzEMw0tYoT54V8XSuuiSq7sJkQGmoVJ86KGAlh0HUr2pS9SQhVnohpES1NsWL2WQH3RBiou6uBvRNha0FUnetU6S0lVNaaEuNCqIt7qk2sbP1KKOy5PbE6hzWfCFuNo+pQ0Y1QsWclLrpkHZ0Jqzy6IJ28zR2hiqs+cfV7Ql4pdHl1KzN9JWTk38otK8wR6iO1xim4VXu/EU55zw9Kqbt3bm+EBc9kSqi7t4p/CXVXHa+b5eosQn2tylQEa51NqPtqOKWyqS8ZwshRARE7EZewVt7rqwh52byXLGEDlZ+6Fs4B5gn1QPKBip185lKeUI9WMltURjoIRZgsGvKui9aaxmEQ6u7zVQI7FYIug4ZFqIeOjCMVOMxsHiahPhtVyIN4LSFrxE7lYRMm3biT68AIuOlYPMJkNnryDFXgsWZgGaEebTU5GIFX9BJFAWHCOLRrVu9sX4jYk8KklkLCRMYKdp/pwBUCcFWWR1dGqOs9d2WTF0x4QJhovlueXV5OmGhjmD4g4OmUkGaFEIbAwr5pVEqer0SYajYezEeLpdO1d9yLl4vRfDCunFD2f54zlBriYuJoAAAAAElFTkSuQmCC" alt="Ailaysa" width="200"/>
</p>

<h1 align="center">Ailaysa</h1>

<p align="center">
  <b>State-of-the-Art Natural Language Processing for Indic Languages</b><br>
  <i>Building the foundation for Tamil and Indic AI systems</i>
</p>

<p align="center">
  <a href="https://pypi.org/project/ailaysa/">
    <img alt="PyPI" src="https://img.shields.io/pypi/v/ailaysa?style=flat-square">
  </a>
  <a href="https://pypi.org/project/ailaysa/">
    <img alt="Python" src="https://img.shields.io/pypi/pyversions/ailaysa?style=flat-square">
  </a>
  <a href="https://github.com/Ailaysa-Technologies/Asai-Tokenizer/blob/main/LICENSE">
    <img alt="License" src="https://img.shields.io/github/license/ailaysa/ailaysa?style=flat-square">
  </a>
  <a href="https://github.com/Ailaysa-Technologies/Asai-Tokenizer/stargazers">
    <img alt="GitHub stars" src="https://img.shields.io/github/stars/ailaysa/ailaysa?style=flat-square">
  </a>
  <a href="https://github.com/Ailaysa-Technologies/Asai-Tokenizer/network/members">
    <img alt="GitHub forks" src="https://img.shields.io/github/forks/ailaysa/ailaysa?style=flat-square">
  </a>
</p>

<p align="center">
  <a href="#installation">Installation</a> •
  <a href="#quick-start">Quick Start</a> •
  <a href="#architecture">Architecture</a> •
  <a href="#model-catalog">Models</a> •
  <a href="#research--development">Research</a> •
  <a href="#community--governance">Community</a>
</p>



##  Overview

**Ailaysa** is an open-source research and engineering initiative focused on advancing **Natural Language Processing for Indic languages**, starting with Tamil.

It provides:

* Production-ready tools
* Research-oriented architecture
* Scalable AI infrastructure

###  Current Focus

Tamil NLP, with future expansion into broader Indic ecosystems.

---

##  The Story of Asai 🌿

**Asai (அசை)** — the fundamental unit of rhythm in Tamil prosody (*யாப்பிலக்கணம்*).
In classical Tamil literature, Asai represents the cadence formed by letters, classified into:

* **Ner (நேர்)** — short rhythmic unit
* **Nirai (நிரை)** — extended rhythmic unit

It is the pulse that gives poetry its movement, structure, and emotion.

Just as Asai forms the building blocks of Tamil verse, **Ailaysa** provides the foundational building blocks for Indic language AI.

> *“To build AI that understands Indic languages, one must first understand their soul.”*

---

##  Key Capabilities

*  **High-Performance Tokenization**
  Optimized subword tokenization tailored for Tamil script

*  **Research-Ready Design**
  Built for experimentation, extensibility, and academic workflows

*  **Production-Ready APIs**
  Clean interfaces designed for real-world deployment

*  **Modular Architecture**
  Plug-and-play components for future expansion

---

##  Installation

### Prerequisites

* Python 3.8+
* pip 20+

### Install via PyPI

```bash
pip install ailaysa
```

---

##  Quick Start

### Tamil Tokenization

```python
from ailaysa import tokenizer

# Load tokenizer
tok = tokenizer.load("asai-v1")

# Input text
text = "தமிழை உலகமெங்கும் கொண்டு சேர்ப்போம்."

# Encode
encoded = tok.encode(text)

print(encoded.ids)
print(encoded.tokens)
print(encoded.length)
```

---

## Architecture

Ailaysa is built with a modular and extensible design:

```
ailaysa/
│
├── tokenizer/      # Tokenization engine
├── embeddings/     # (Upcoming)
├── translation/    # (Upcoming)
├── ocr/            # (Upcoming)
├── models/         # Model storage
```

---

##  Model Catalog

| Model     | Description                     |
| --------- | ------------------------------- |
| `asai-v1` | General-purpose Tamil tokenizer |

---

## Research & Development

Ailaysa bridges **academic research** and **industrial applications**.

### Current Research Areas

* **Computational Linguistics**
  Morphological and syntactic analysis for Tamil

* **Low-Resource NLP**
  Training techniques with limited annotated data

* **Multilingual Transfer Learning**
  Cross-lingual learning across Indic languages

* **Cultural NLP**
  Preserving linguistic and cultural nuances in AI

---

##  Citation

If you use Ailaysa in your research:

```bibtex
@software{ailaysa2026,
  title = {Ailaysa: Indic Language NLP Toolkit},
  author = {Mukesh Anand G and Ailaysa Technologies},
  year = {2026},
  url = {https://github.com/ailaysa/ailaysa}
}
```

---

##  Community & Governance

Ailaysa is built by a growing community of:

* AI engineers
* Researchers
* Linguists
* Open-source contributors

### Ways to Contribute

*  Code (features, optimizations)
*  Data (datasets, corpora)
*  Research (papers, experiments)
*  Documentation (guides, tutorials)

---

## Author

**Mukesh Anand G**  
AI Research Engineer

---

## Organization

Developed and maintained by **Ailaysa Technologies**

---

## License

This project is licensed under the **MIT License**.

---

<p align="center">
  <b>Built with precision. Inspired by heritage. Open for the future.</b>
</p>

<p align="center">
  <a href="https://github.com/ailaysa/ailaysa">GitHub</a> •
  <a href="https://pypi.org/project/ailaysa/">PyPI</a> •
  <a href="https://ailaysa.com/">Website</a> 
</p>
