Metadata-Version: 2.4
Name: naganlp
Version: 0.1.0
Summary: A Natural Language Processing toolkit for the Nagamese language.
Home-page: https://github.com/AgnivaMaiti/naga-nlp
Author: Agniva Maiti
Author-email: Agniva Maiti <agnivamaiti.official@gmail.com>
License: MIT License
        
        Copyright (c) 2025 Your Name
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/AgnivaMaiti/naga-nlp
Project-URL: Bug Tracker, https://github.com/AgnivaMaiti/naga-nlp/issues
Project-URL: Repository, https://github.com/AgnivaMaiti/naga-nlp
Project-URL: Documentation, https://github.com/AgnivaMaiti/naga-nlp#readme
Keywords: nlp,nagamese,natural-language-processing,machine-translation,pos-tagging
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: nltk>=3.8.1
Requires-Dist: sentencepiece>=0.1.99
Requires-Dist: transformers>=4.30.0
Requires-Dist: torch>=2.0.0
Requires-Dist: datasets>=2.0.0
Requires-Dist: scikit-learn>=1.0.0
Requires-Dist: pandas>=1.3.0
Requires-Dist: importlib-resources>=5.12.0; python_version < "3.9"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: isort>=5.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: flake8>=6.0.0; extra == "dev"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# NagaNLP: Natural Language Processing for Nagamese

[![PyPI](https://img.shields.io/pypi/v/naganlp)](https://pypi.org/project/naganlp/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python Version](https://img.shields.io/pypi/pyversions/naganlp)](https://pypi.org/project/naganlp/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

A comprehensive Natural Language Processing toolkit for the Nagamese language, developed by Agniva Maiti (4th Year BTech, KIIT-DU, Bhubaneswar, Odisha).

[![Python Version](https://img.shields.io/badge/python-3.8%2B-blue.svg)](https://www.python.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

A comprehensive NLP toolkit for the Nagamese language, featuring state-of-the-art models for part-of-speech tagging and machine translation.

## Features

- **Part-of-Speech Tagging**: Fine-tuned BERT model for accurate POS tagging
- **Neural Machine Translation**: Seq2Seq model for Nagamese to English translation
- **Subword Tokenization**: Support for handling out-of-vocabulary words
- **Word Alignment**: Tools for parallel corpus alignment
- **Easy Integration**: Simple Python API for all functionalities

## Installation

```bash
pip install naganlp
```

## Quick Start

### Part-of-Speech Tagging

#### Transformer-based Tagger (Recommended for production)
This uses a fine-tuned transformer model for high accuracy.

```python
from naganlp import PosTagger

# Initialize the tagger (automatically downloads the model on first use)
tagger = PosTagger("agnivamaiti/naganlp-pos-tagger")  # Default model

# Tag a Nagamese sentence
result = tagger.tag("moi school te jai")
print(result)
# Output: [{'entity_group': 'PRON', 'word': 'moi', ...}]
```

#### NLTK-based Tagger (Lightweight, faster but less accurate)
This is a good option for development or when resources are limited.

```python
from naganlp import NltkPosTagger

# First train and save the model (only needed once)
from naganlp.nltk_tagger import train_and_save_nltk_tagger
train_and_save_nltk_tagger("path/to/your/conll/file.conll", "naga_pos_model.pkl")

# Then load and use the trained model
tagger = NltkPosTagger("naga_pos_model.pkl")

# Tag a list of pre-tokenized words
result = tagger.predict(["moi", "school", "te", "jai"])
print(result)
# Output: [('moi', 'PRON'), ('school', 'NOUN'), ('te', 'ADP'), ('jai', 'VERB')]
```

### Translation

```python
from naganlp import Translator

# Initialize the translator
translator = Translator()

# Translate from Nagamese to English
translation = translator.translate("moi school te jai")
print(translation)
# Output: "I go to school"
```

## Documentation

### Data Requirements

- For POS Tagging: CONLL-formatted file with token and POS tag columns
- For Translation: Parallel corpus in CSV format with 'nagamese' and 'english' columns

### Model Training

#### POS Tagger Training

```bash
python main.py train-tagger --conll-file path/to/train.conll --hub-id your-username/naganlp-pos-tagger
```

#### NMT Model Training

```bash
python main.py train-translator --data-file path/to/parallel_corpus.csv --hub-id your-username/naganlp-nmt
```

### Advanced Usage

#### Custom Model Paths

```python
# Load custom models
custom_tagger = PosTagger(model_name_or_path="path/to/custom/model")
custom_translator = Translator(model_path="path/to/translator.pt", vocabs_path="path/to/vocabs.pkl")
```

## Contributing

Contributions are welcome! Please read our [Contributing Guidelines](CONTRIBUTING.md) for details.

## Contributing

Contributions are welcome! Please read our [Contributing Guidelines](CONTRIBUTING.md) and [Code of Conduct](CODE_OF_CONDUCT.md) for details.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Contact

- Agniva Maiti
- Email: agnivamaiti.official@gmail.com
- GitHub: [@AgnivaMaiti](https://github.com/AgnivaMaiti)
- LinkedIn: [Agniva Maiti](https://linkedin.com/in/agniva-maiti)

## Acknowledgments
- KIIT University for the support and resources
- All contributors and users of this library

## Citation

If you use NagaNLP in your research, please cite:

```bibtex
@software{naganlp2023,
  title={NagaNLP: Natural Language Processing Toolkit for Nagamese},
  author={Your Name},
  year={2023},
  publisher={GitHub},
  journal={GitHub repository},
  howpublished={\url{https://github.com/your-username/naga-nlp}}
}
```

## Support

For questions and support, please open an issue on our [GitHub repository](https://github.com/your-username/naga-nlp/issues).
