Metadata-Version: 2.4
Name: bisindo-trans
Version: 1.1.0
Summary: Penerjemah Bahasa Isyarat Indonesia (BISINDO) ke Bahasa Indonesia menggunakan N-gram dan Neural LM
Home-page: https://github.com/aldialdifatih/bisindo-trans
Author: Muhammad Aldi Alfatih
Author-email: Muhammad Aldi Alfatih <aldialfatih016@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/AldiAlfatih/bisindo-trans
Project-URL: Repository, https://github.com/AldiAlfatih/bisindo-trans
Keywords: sign-language,translation,nlp,bisindo,indonesian,n-gram,gpt2,beam-search,nucleus-sampling
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: pandas>=1.5.0
Requires-Dist: nltk>=3.8.0
Requires-Dist: openpyxl>=3.0.0
Provides-Extra: neural
Requires-Dist: torch>=2.0.0; extra == "neural"
Requires-Dist: transformers>=4.30.0; extra == "neural"
Dynamic: author
Dynamic: home-page
Dynamic: requires-python

﻿# BisindoTrans

**Hybrid Sign Language Translation System for Indonesian**

![Python](https://img.shields.io/badge/Python-3.9%2B-blue?logo=python&logoColor=white)
![License](https://img.shields.io/badge/License-MIT-green)
![Status](https://img.shields.io/badge/Status-Active-brightgreen)
![NLP](https://img.shields.io/badge/NLP-NLTK%20%7C%20HuggingFace-orange)

---

## Tentang Proyek

BisindoTrans adalah sistem penerjemahan bahasa isyarat Indonesia (BISINDO) ke teks Bahasa Indonesia natural yang dikembangkan sebagai bagian dari penelitian Skripsi S1 Teknik Informatika.

Proyek ini mengeksplorasi pendekatan **Hybrid** yang menggabungkan:
- **Statistical Language Model** (N-gram Bigram) untuk efisiensi dan presisi
- **Neural Language Model** (GPT-2 Indonesian) untuk fleksibilitas generasi

Sistem ini juga membandingkan dua strategi decoding:
- **Beam Search** — deterministik, konsisten
- **Nucleus Sampling** — stokastik, variatif

Evaluasi dilakukan menggunakan metrik standar NLP:
- **BLEU Score** — n-gram precision
- **chrF Score** — character-level F-score

---

## Instalasi

```bash
# Clone repository
git clone https://github.com/username/bisindo-trans.git
cd bisindo-trans

# Install package (development mode)
pip install -e .

# Dengan dukungan Neural LM (GPT-2)
pip install -e ".[neural]"
```

**Requirements:**
- Python 3.9+
- pandas, nltk, openpyxl
- torch, transformers (opsional, untuk mode neural)

---

## Quick Start

```python
from bisindotrans import Translator

# Inisialisasi dengan N-gram model
translator = Translator(model_type="ngram")

# Terjemahkan glosa ke bahasa natural
hasil = translator.translate("SAYA MAKAN NASI", method="beam")
print(hasil)  # Output: "Saya makan nasi."

# Dengan nama (ejaan jari)
hasil = translator.translate("NAMA SAYA M U H A M M A D A L D I", method="beam")
print(hasil)  # Output: "Nama saya Muhammad Aldi."

# Perbandingan metode
beam_result = translator.translate("SELAMAT PAGI", method="beam")
nucleus_result = translator.translate("SELAMAT PAGI", method="nucleus")
```

---

## Fitur Unggulan

| Fitur | Deskripsi |
|-------|-----------|
| **Hybrid Model** | Pilih antara Statistical N-gram atau Neural GPT-2 sesuai kebutuhan |
| **Dual Decoding** | Beam Search (presisi tinggi) dan Nucleus Sampling (variasi output) |
| **Smart NER** | Deteksi otomatis nama orang dari ejaan jari (fingerspelling) |
| **Auto-Capitalization** | Kapitalisasi otomatis untuk nama dan awal kalimat |
| **Model Persistence** | Simpan/muat model untuk loading instan (~10 detik vs ~5 menit training) |
| **Anti-Hallucination** | Filter bawaan untuk mencegah output yang tidak relevan pada Neural LM |
| **Preprocessing Pipeline** | Normalisasi glosa, penggabungan ejaan jari, koreksi frasa |
| **Batch Translation** | Terjemahkan banyak kalimat sekaligus |

---

## Benchmark

### Perbandingan Model

| Model | Decoding | BLEU | chrF | Latency |
|-------|----------|------|------|---------|
| N-gram | Beam Search | <!-- TODO --> | <!-- TODO --> | <!-- TODO --> |
| N-gram | Nucleus | <!-- TODO --> | <!-- TODO --> | <!-- TODO --> |
| Neural | Beam Search | <!-- TODO --> | <!-- TODO --> | <!-- TODO --> |
| Neural | Nucleus | <!-- TODO --> | <!-- TODO --> | <!-- TODO --> |

### Konfigurasi Pengujian

| Parameter | N-gram | Neural |
|-----------|--------|--------|
| Corpus Size | ~19 juta kata | Pre-trained |
| Vocab Size | ~1.4 juta | 50,257 |
| Beam Width | 3 | 5 |
| Top-p (Nucleus) | 0.5 | 0.7 |

---

## Struktur Package

```
bisindotrans/
├── __init__.py              # Public API
├── translator.py            # Main Translator class
├── preprocessing/
│   ├── normalisasi.py       # Cleaning & deduplication
│   └── naturalisasi.py      # NER & phrase mapping
├── models/
│   ├── ngram.py             # Statistical bigram model
│   └── neural.py            # GPT-2 wrapper
├── decoding/
│   └── strategies.py        # Beam Search & Nucleus Sampling
└── utils/
    └── postprocessing.py    # Output formatting
```

---

## Penggunaan Lanjutan

### Ganti Model

```python
# Mode N-gram (cepat, offline)
t = Translator(model_type="ngram")

# Mode Neural (butuh GPU/CPU kuat)
t = Translator(model_type="neural")
```

### Custom Model Path

```python
# Load model dari lokasi custom
t = Translator(model_type="ngram", model_path="path/to/custom_model.pkl")
```

### Batch Processing

```python
glosses = ["SELAMAT PAGI", "TERIMA KASIH", "SAMPAI JUMPA"]
results = translator.translate_batch(glosses, method="beam")
```

---

## Referensi

- Holtzman, A., et al. (2019). *The Curious Case of Neural Text Degeneration*
- Radford, A., et al. (2019). *Language Models are Unsupervised Multitask Learners*
- Papineni, K., et al. (2002). *BLEU: a Method for Automatic Evaluation of Machine Translation*

---

## Author

**Muhammad Aldi Alfatih**  
**aldialfatih016@gmail.com**7
Skripsi S1 Ilmu Komputer
2026

---

## License

MIT License - Silakan gunakan untuk keperluan akademis dan pengembangan.
