Metadata-Version: 2.1
Name: domain-matcher
Version: 0.1.0
Summary: Protect your models from out of scope domains. Focus on what matters.
License: Apache V2
Author: Dref360
Author-email: fred@glowstick.cx
Requires-Python: >=3.9,<4.0
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: accelerate (>=0.31.0,<0.32.0)
Requires-Dist: bertopic (>=0.16.2,<0.17.0)
Requires-Dist: datasets (>=2.19.2,<3.0.0)
Requires-Dist: keybert (>=0.8.4,<0.9.0)
Requires-Dist: matplotlib (>=3.9.0,<4.0.0)
Requires-Dist: orjson (>=3.10.3,<4.0.0)
Requires-Dist: pandas (>=2.2.2,<3.0.0)
Requires-Dist: plotly (>=5.22.0,<6.0.0)
Requires-Dist: pydantic (>=2.7.3,<3.0.0)
Requires-Dist: scikit-learn (>=1.5.0,<2.0.0)
Requires-Dist: seaborn (>=0.13.2,<0.14.0)
Requires-Dist: sentence-transformers (>=3.0.1,<4.0.0)
Requires-Dist: structlog (>=24.2.0,<25.0.0)
Requires-Dist: tensorboard (>=2.17.0,<3.0.0)
Requires-Dist: torch (>=2.3.1,<3.0.0)
Requires-Dist: transformers (>=4.41.2,<5.0.0)
Requires-Dist: typer[all] (>=0.12.3,<0.13.0)
Description-Content-Type: text/markdown

# Domain Matcher

[![](https://img.shields.io/badge/Read_our_Blog-blue?logo=readdotcv)](https://dref360.github.io/domainmatching/)

Domain Matcher is a library that aims at matching a pre-defined domain to your input data.
Input without domain are deemed not important and thus can be safely filtered out.

> Domain Matching performs very cheap OoD detection using topic modeling and keyword extraction.

`pip install domain-matcher`

## Usage

```python
from datasets import load_dataset
from domain_matcher.core import DomainMatcher, DMConfig

# Custom version of `clinc-oos` where non-banking classes are assigned to oos.
ds = load_dataset("GlowstickAI/banking-clinc-oos", "plus")
config = DMConfig(text_column='text', label_column='intent', oos_class='oos')
dmatcher = DomainMatcher(config)
# Fit DM on your train data see our blog to see what's happening!
dmatcher.fit(ds['train'])

# Predict: You can predict on a string, List[str] or Dataset
dmatcher.transform("Can you cancel my credit card?")['in_domain']
# >>> True
dmatcher.transform("Can you cancel my reservation at Giorgi's?")['in_domain']
# >>> False
```

### Troubleshooting

For troubleshooting, please see our [wiki](https://github.com/GlowstickAI/domain-matcher/wiki) or [submit an issue](https://github.com/GlowstickAI/domain-matcher/issues) if you can't find what you're looking for.

## Development

* Install Pyenv
  * `curl https://pyenv.run | bash`
  * `pyenv install 3.9.13 && pyenv global 3.9.13`
* [Install Poetry](https://python-poetry.org/docs/master/#installing-with-the-official-installer)
* `poetry install`
* Add precommits
  * `poetry run pre-commit install`

### Tooling

* `make format`: format the code with Ruff
* `make test`: run unit tests and mypy.
