Metadata-Version: 2.4
Name: yaduha
Version: 0.3.2
Summary: A type-safe, AI-powered framework for structured language translation
Author: Diego Cuadros, Nicholas Leeds, Bhaskar Krishnamachari, Kira Toal, Ruben Rosales, Khalil Iskarous
Author-email: Jared Coleman <jared.coleman@lmu.edu>
License: # Yaduha License (Academic Open License v1.0)
        
        **Copyright © 2025**  
        **Loyola Marymount University** – Kubishi Research Group  
        **University of Southern California** – Autonomous Networks Research Group  
        **Authors:** Jared Coleman, Diego Cuadros, Nicholas Leeds, Bhaskar Krishnamachari, Kira Toal, Ruben Rosales, Khalil Iskarous  
        
        ## 1. Permission and Use
        
        Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to use, copy, modify, merge, publish, and distribute the Software **for academic, research, and educational purposes**, subject to the following conditions:
        
        1. **Attribution** — All copies or substantial portions of the Software must include this copyright notice, the list of authors, and this permission notice.  
        2. **Acknowledgment** — Any publication, product, or presentation using the Software must acknowledge:  
           > “This work utilizes the *Yaduha* framework, developed by the Kubishi Research Group at Loyola Marymount University and the Autonomous Networks Research Group at the University of Southern California.”  
        3. **No Endorsement** — The names of Loyola Marymount University, the University of Southern California, or any of the authors may not be used to endorse or promote products derived from this Software without prior written permission.  
        4. **Noncommercial Use** — The Software may not be used for commercial purposes without explicit written permission from Loyola Marymount University.  
        5. **Derivative Works** — Modified versions or derivative works must clearly indicate that they have been changed and may not imply endorsement by the original authors or institutions.
        
        ## 2. Disclaimer
        
        THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT, OR OTHERWISE, ARISING FROM, OUT OF, OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
        
        
        For questions or permissions, contact:  
        **Dr. Jared Coleman**  
        *Kubishi Research Group, Loyola Marymount University*  
        [jared.coleman@lmu.edu](mailto:jared.coleman@lmu.edu)
        
Project-URL: Homepage, https://github.com/kubishi/yaduha
Project-URL: Documentation, https://github.com/kubishi/yaduha/tree/main/docs
Project-URL: Repository, https://github.com/kubishi/yaduha
Project-URL: Bug Tracker, https://github.com/kubishi/yaduha/issues
Keywords: translation,llm,endangered-languages,language-learning,machine-translation
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE.md
Requires-Dist: pydantic
Requires-Dist: tomli>=2.0; python_version < "3.11"
Provides-Extra: full
Requires-Dist: fastapi; extra == "full"
Requires-Dist: uvicorn; extra == "full"
Requires-Dist: python-dotenv; extra == "full"
Requires-Dist: requests; extra == "full"
Requires-Dist: anthropic; extra == "full"
Requires-Dist: openai; extra == "full"
Requires-Dist: httpx; extra == "full"
Requires-Dist: sacrebleu>=2.0; extra == "full"
Requires-Dist: bert-score>=0.3; extra == "full"
Requires-Dist: unbabel-comet>=2.0; extra == "full"
Provides-Extra: dev
Requires-Dist: yaduha[full]; extra == "dev"
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: pyright; extra == "dev"
Dynamic: license-file

# Yaduha

A framework for translating into low-resource and endangered languages using LLMs with grammatical constraints. Implements **LLM-Assisted Rule-Based Machine Translation (LLM-RBMT)** -- the LLM never needs to "know" the target language. Instead, it decomposes English input into structured forms that linguistic rules synthesize into the target language.

Based on the paper: [LLM-Assisted Rule Based Machine Translation for Low/No-Resource Languages](https://arxiv.org/pdf/2405.08997)

## Install

```bash
pip install yaduha          # core (pydantic only)
pip install yaduha[full]    # + LLM backends, API server, evaluation metrics
```

Language packages are installed separately:

```bash
pip install yaduha-ovp      # Owens Valley Paiute
```

## Usage

```python
from yaduha.agent.openai import OpenAIAgent
from yaduha.translator.pipeline import PipelineTranslator

translator = PipelineTranslator.from_language(
    "ovp",
    agent=OpenAIAgent(model="gpt-4o-mini", api_key="..."),
)

result = translator("The dog is sleeping.")
print(result.target)                    # target language output
print(result.back_translation.source)   # back-translation for verification
```

## How it works

Sentence types are Pydantic models that encode a language's grammar. The LLM fills them via constrained decoding (structured output), guaranteeing every output is grammatically valid by construction. No parallel corpora required -- only a lexicon and grammatical rules.

```
English input
    -> LLM decomposes into structured Sentence models
    -> Sentence.__str__() renders target language text
    -> (optional) back-translate for verification
```

## Translation strategies

**Pipeline** -- grammar-guaranteed via structured output. The LLM maps English into one or more `Sentence` subclasses; the `__str__` method renders the target language. Output is always grammatically correct.

```python
from yaduha.translator.pipeline import PipelineTranslator
translator = PipelineTranslator(agent=agent, SentenceType=(SVO, SV))
```

**Agentic** -- free-form with tool assistance. The LLM reasons freely and can call tools (dictionary lookup, pipeline translator, etc.) to produce a translation.

```python
from yaduha.translator.agentic import AgenticTranslator
translator = AgenticTranslator(agent=agent, tools=[dictionary, pipeline])
```

## Creating a language package

Define sentence types as Pydantic models and register via entrypoint:

```python
from yaduha.language import Sentence

class SVSentence(Sentence):
    subject: Subject
    verb: Verb

    def __str__(self) -> str:
        return f"{self.subject.render()} {self.verb.render()}"

    @classmethod
    def get_examples(cls) -> list[tuple[str, "SVSentence"]]:
        return [("I sleep.", cls(subject=..., verb=...))]
```

```toml
# pyproject.toml
[project.entry-points."yaduha.languages"]
my_lang = "my_package:language"
```

See [yaduha-ovp](https://github.com/kubishi/yaduha-ovp) for a complete example.

## LLM backends

- OpenAI (`yaduha.agent.openai`)
- Anthropic (`yaduha.agent.anthropic`)
- Google Gemini (`yaduha.agent.gemini`)
- Ollama (`yaduha.agent.ollama`)

## Evaluation

Built-in evaluators for back-translation quality: chrF, BLEU, BERTScore, COMET, and OpenAI embedding similarity.

```python
from yaduha.evaluator.chrf import ChrfEvaluator
from yaduha.evaluator import batch_evaluate

results = batch_evaluate(translations, ChrfEvaluator())
```

## CLI

```bash
yaduha languages list              # list installed language packages
yaduha languages info ovp          # show language details
yaduha languages validate ovp      # validate a language implementation
yaduha serve                       # start FastAPI server + dashboard
```

## Development

```bash
pip install yaduha[dev]
pytest tests/ -q
ruff check yaduha/ tests/
pyright yaduha/
```

## Citation

```bibtex
@article{coleman2024llm,
  title={LLM-Assisted Rule Based Machine Translation for Low/No-Resource Languages},
  author={Coleman, Jared and Cuadros, Diego and Leeds, Nicholas and Krishnamachari, Bhaskar and Toal, Kira and Rosales, Ruben and Iskarous, Khalil},
  journal={arXiv preprint arXiv:2405.08997},
  year={2024}
}
```
