Metadata-Version: 2.4
Name: mutato
Version: 0.5.23
Summary: Mutato Synonym Swapping API
License: MIT
License-File: LICENSE
Keywords: transcript schools,text extraction,statistical NLP,semantic NLP,AI,data analytics,natural language processing,machine learning,text analysis
Author: Craig Trim
Author-email: ctrim@maryville.edu
Maintainer: Craig Trim
Maintainer-email: ctrim@maryville.edu
Requires-Python: >=3.10,<3.14
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Indexing
Classifier: Topic :: Text Processing :: Linguistic
Requires-Dist: lingpatlab
Requires-Dist: numpy (==2.2.6)
Requires-Dist: rdflib
Requires-Dist: spacy (==3.8.2)
Project-URL: Bug Tracker, https://github.com/Maryville-University-DLX/transcriptiq/issues
Project-URL: Repository, https://github.com/Maryville-University-DLX/transcriptiq/libs/core/mutato-core
Description-Content-Type: text/markdown

# mutato

[![Python](https://img.shields.io/badge/python-3.10%20%7C%203.11%20%7C%203.12%20%7C%203.13-blue)](https://www.python.org)
[![Version](https://img.shields.io/badge/version-0.5.22-informational)](https://github.com/Maryville-University-DLX/transcriptiq)
[![Status](https://img.shields.io/badge/status-beta-yellow)](https://github.com/Maryville-University-DLX/transcriptiq)
[![License](https://img.shields.io/badge/license-proprietary-red)](LICENSE)
[![Built with Poetry](https://img.shields.io/badge/built%20with-poetry-blueviolet)](https://python-poetry.org)
[![spaCy](https://img.shields.io/badge/NLP-spaCy%203.8-09a3d5)](https://spacy.io)

Ontology-driven synonym swapping for semantic text enrichment. Mutato identifies terms in input text and replaces them with semantically equivalent synonyms sourced from OWL ontologies, enabling consistent, structured analysis of natural language content.

## Use Cases

- Normalize terminology across transcripts before downstream analysis
- Enrich tokens with ontology-backed synonym candidates
- Bridge informal language to structured vocabulary in NLP pipelines

## Quick Start

```python
from mutato.parser import owl_parse

results = owl_parse(tokens=["student", "learned", "math"], ontologies=[...])
```

## Installation

```bash
make all
```

This downloads the spaCy model, installs dependencies, runs tests, builds the package, and freezes requirements.

Or step by step:

```bash
make get_model   # download en_core_web_sm
make install     # poetry lock + install
make test        # run pytest
make build       # install + test + poetry build
make freeze      # export requirements.txt
```

## Architecture

Mutato is organized into four modules:

| Module | Purpose |
|---|---|
| `mutato.parser` | Main API -- synonym swapping and token matching |
| `mutato.finder` | Ontology lookup across single and multiple OWL graphs |
| `mutato.mda` | Metadata and NER enrichment generation |
| `mutato.core` | Shared utilities (file I/O, text, validation, timing) |

See [docs/architecture.md](docs/architecture.md) for design details.

## Matching Strategies

The parser applies multiple matching passes in order:

1. **Exact** -- literal string match against ontology terms
2. **Span** -- multi-token window matching
3. **Hierarchy** -- parent/child concept traversal
4. **spaCy** -- lemma and POS-aware NLP matching

## Requirements

- Python >= 3.10, < 3.14
- [Poetry](https://python-poetry.org) for dependency management
- spaCy `en_core_web_sm` model (installed via `make get_model`)

## Links

- [Issue Tracker](https://github.com/Maryville-University-DLX/transcriptiq/issues)
- [Source](https://github.com/Maryville-University-DLX/transcriptiq/libs/core/mutato-core)

