Metadata-Version: 2.2
Name: faster-translate
Version: 1.0.2
Summary: A high-performance translation library using CTTranslate2 and vLLM.
Author-email: Sawradip Saha <sawradip0@gmail.com>
License: MIT License
        
        Copyright (c) 2024 Sawradip Saha
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
Project-URL: Homepage, https://github.com/sawradip/faster-translate
Project-URL: Bug Tracker, https://github.com/sawradip/faster-translate/issues
Project-URL: Documentation, https://github.com/sawradip/faster-translate/blob/main/README.md
Keywords: translation,huggingface,nlp,ctranslate2,vllm
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Natural Language :: English
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: transformers>=4.36.1
Requires-Dist: ctranslate2>=4.1.0
Requires-Dist: tokenizers>=0.15.2
Requires-Dist: datasets>=2.10.0
Requires-Dist: huggingface_hub>=0.19.0
Requires-Dist: tqdm>=4.64.0
Provides-Extra: normalizer
Requires-Dist: csebuetnlp-normalizer>=1.0.0; extra == "normalizer"
Provides-Extra: vllm
Requires-Dist: vllm>=0.3.0; extra == "vllm"
Provides-Extra: all
Requires-Dist: csebuetnlp-normalizer>=1.0.0; extra == "all"
Requires-Dist: vllm>=0.3.0; extra == "all"

# Faster Translate

<div align="center">

[![PyPI Downloads](https://static.pepy.tech/badge/faster-translate)](https://pepy.tech/projects/faster-translate)
[![Monthly Downloads](https://static.pepy.tech/badge/faster-translate/month)](https://pepy.tech/projects/faster-translate)
[![GitHub License](https://img.shields.io/github/license/sawradip/faster-translate)](https://github.com/sawradip/faster-translate/blob/main/LICENSE)
[![PyPI Version](https://img.shields.io/pypi/v/faster-translate)](https://pypi.org/project/faster-translate/)

</div>

A high-performance translation library powered by state-of-the-art models. Faster Translate offers optimized inference using CTranslate2 and vLLM backends, providing an easy-to-use interface for applications requiring efficient and accurate translations.

## 🚀 Features

- **High-performance inference** using CTranslate2 and vLLM backends
- **Seamless integration** with Hugging Face models
- **Flexible API** for single sentence, batch, and large-scale translation
- **Dataset translation** with direct Hugging Face integration
- **Multi-backend support** for both traditional (CTranslate2) and LLM-based (vLLM) models
- **Text normalization** for improved translation quality

## 📦 Installation

```bash
pip install faster-translate
```

### Optional Dependencies

For specific normalizers or model backends:

```bash
# For Bengali text normalization
pip install git+https://github.com/csebuetnlp/normalizer

# For vLLM backend support (required for LLM-based models)
pip install vllm
```

## 🔍 Usage

### Basic Translation

```python
from faster_translate import TranslatorModel

# Initialize with a pre-configured model
translator = TranslatorModel.from_pretrained("banglanmt_bn2en")

# Translate a single sentence
english_text = translator.translate_single("দেশে বিদেশি ঋণ নিয়ে এখন বেশ আলোচনা হচ্ছে।")
print(english_text)

# Translate a batch of sentences
bengali_sentences = [
    "দেশে বিদেশি ঋণ নিয়ে এখন বেশ আলোচনা হচ্ছে।",
    "রাত তিনটার দিকে কাঁচামাল নিয়ে গুলিস্তান থেকে পুরান ঢাকার শ্যামবাজারের আড়তে যাচ্ছিলেন লিটন ব্যাপারী।"
]
translations = translator.translate_batch(bengali_sentences)
```

### Using Different Model Backends

```python
# Using a CTTranslate2-based model
ct2_translator = TranslatorModel.from_pretrained("banglanmt_bn2en")

# Using a vLLM-based model
vllm_translator = TranslatorModel.from_pretrained("bangla_qwen_en2bn")
```

### Loading Models from Hugging Face

```python
# Load a specific model from Hugging Face
translator = TranslatorModel.from_pretrained(
    "sawradip/faster-translate-banglanmt-bn2en-t5",
    normalizer_func="buetnlpnormalizer"
)
```

### Translating Hugging Face Datasets

Translate an entire dataset with a single function call:

```python
translator = TranslatorModel.from_pretrained("banglanmt_en2bn")

# Translate the entire dataset
translator.translate_hf_dataset(
    "sawradip/bn-translation-mega-raw-noisy", 
    batch_size=16
)

# Translate specific subsets
translator.translate_hf_dataset(
    "sawradip/bn-translation-mega-raw-noisy",
    subset_name=["google"], 
    batch_size=16
)

# Translate a portion of the dataset
translator.translate_hf_dataset(
    "sawradip/bn-translation-mega-raw-noisy",
    subset_name="alt",
    batch_size=16, 
    translation_size=0.5  # Translate 50% of the dataset
)
```

### Publishing Translated Datasets

Push translated datasets directly to Hugging Face:

```python
translator.translate_hf_dataset(
    "sawradip/bn-translation-mega-raw-noisy",
    subset_name="alt",
    batch_size=16, 
    push_to_hub=True,
    token="your_huggingface_token",
    save_repo_name="your-username/translated-dataset"
)
```

## 🌐 Supported Models

| Model ID | Source Language | Target Language | Backend | Description |
|----------|----------------|----------------|---------|-------------|
| `banglanmt_bn2en` | Bengali | English | CTranslate2 | BanglaNMT model from BUET |
| `banglanmt_en2bn` | English | Bengali | CTranslate2 | BanglaNMT model from BUET |
| `bangla_mbartv1_en2bn` | English | Bengali | CTranslate2 | MBart-based translation model |
| `bangla_qwen_en2bn` | English | Bengali | vLLM | Qwen-based translation model |

## 🛠️ Advanced Configuration

### Custom Sampling Parameters for vLLM Models

```python
from vllm import SamplingParams

# Create custom sampling parameters
sampling_params = SamplingParams(
    temperature=0.7,
    top_p=0.9,
    max_tokens=512
)

# Initialize translator with custom parameters
translator = TranslatorModel.from_pretrained(
    "bangla_qwen_en2bn", 
    sampling_params=sampling_params
)
```

## 💪 Contributors

<a href="https://github.com/sawradip/faster-translate/graphs/contributors">
  <img src="https://contributors-img.web.app/image?repo=sawradip/faster-translate" alt="List of Contributors"/>
</a>

## 📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

## 📚 Citation

If you use Faster Translate in your research, please cite:

```bibtex
@software{faster_translate,
  author = {Sawradip Saha and Contributors},
  title = {Faster Translate: High-Performance Machine Translation Library},
  url = {https://github.com/sawradip/faster-translate},
  year = {2024},
}
```
