Metadata-Version: 2.4
Name: bangla-text-normalizer
Version: 0.1.1
Summary: A robust text normalizer tool for Bangla language
Home-page: https://gitlab.com/nahidahsan74/bangla-text-normalizer
Author: Nahid Ahsan
Author-email: nahidahsan74@gmail.com
Project-URL: Documentation, https://gitlab.com/nahidahsan74/bangla-text-normalizer/-/blob/main/README.md
Project-URL: Source, https://gitlab.com/nahidahsan74/bangla-text-normalizer
Project-URL: Tracker, https://gitlab.com/nahidahsan74/bangla-text-normalizer/-/issues
Keywords: bangla,bengali,text-normalizer,normalization,nlp,natural-language-processing,tts,text-to-speech,num2words,bangla-nlp
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license-file
Dynamic: project-url
Dynamic: requires-python
Dynamic: summary

# Bangla Text Normalizer

[![PyPI version](https://badge.fury.io/py/bangla-text-normalizer.svg)](https://badge.fury.io/py/bangla-text-normalizer)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.6+](https://img.shields.io/badge/python-3.6+-blue.svg)](https://www.python.org/downloads/)

A robust, dependency-free **Bangla Text Normalizer** (Bengali Text Normalizer) Python package designed for **NLP** (Natural Language Processing) and **TTS** (Text-to-Speech) pipelines. It accurately converts numeric symbols, dates, time, currencies, phone numbers, and abbreviations into their spoken Bangla textual format.

Unlike other tools, this package handles the Indian Numbering System (Lakh/Crore) correctly and provides context-aware normalizations for improved audio data quality.

This tool is optimized for:
- **Text-to-Speech (TTS) Preprocessing**: Converting raw text into perfectly pronounceable Bangla words.
- **ASR & NLP Data Preparation**: Standardizing text for training machine learning models.
- **Chatbots & Voice Assistants**: Ensuring textual responses are ready for voice synthesis.

## Installation

```bash
pip install bangla-text-normalizer
```

## Usage

```python
from bangla_text_normalizer import BanglaTextNormalizer

normalizer = BanglaTextNormalizer()

# Example 1: Addresses and Numbers
text = "৫ম তলা, ৩/বি, গুলশান-২, ঢাকা ১২১২"
print(normalizer.normalize(text))
# Expected Output:
# পঞ্চম তলা, ৩/বি, গুলশান-২, ঢাকা এক দুই এক দুই

# Example 2: Dates (Smart Pronunciation)
print(normalizer.normalize("21/02/1952")) 
# Output: একুশে ফেব্রুয়ারি ঊনিশশো বাহান্ন

# Example 3: Phone Numbers
print(normalizer.normalize("01711000000"))
# Output: শূন্য এক সাত এক এক শূন্য শূন্য শূন্য শূন্য শূন্য শূন্য

# Example 4: Decimals and Fractions
print(normalizer.normalize("3.5"))
# Output: তিন দশমিক পাঁচ
```

## Features

- **Number Conversion**: Converts integers to text (Supports extremely large numbers like Lakhs/Crores).
- **Date Normalization**: 
    - Handles `DD/MM/YYYY` formats.
    - Context-aware dates: `21` -> `একুশে` (Ekmshe), `01` -> `পহেলা` (Pohela).
    - Smart Year handling: `1952` -> `ঊনিশশো বাহান্ন` (Nineteen Hundred Fifty-Two), `2024` -> `দুই হাজার চব্বিশ`.
- **Phone Numbers**: 
    - Normalizes BD phone numbers (e.g., `017...`, `+88017...`) into individual digits.
    - Robust handling of spaces and hyphens (e.g., `017 11000000`).
- **Ordinals**: 
    - Supports numeric ordinals: `1st`, `2nd`, `10th`.
    - Supports Bangla short-forms: `১ম`, `২য়`, `১০ম`.
    - Auto-generates ordinals for large numbers: `1023rd` -> `এক হাজার তেইশতম`.
- **Decimals & Fractions**: Handles floating point numbers (`3.1416`) and simple fractions (`1/2`).
- **Zero Dependency**: Pure Python implementation with no external requirements.

## License

MIT License
