Metadata-Version: 2.1
Name: second-opinion-ruler
Version: 0.1.0
Summary: A spaCy custom component that extends the SpanRuler with a second opinion
Home-page: https://github.com/mr-bjerre/second-opinion-ruler
License: MIT
Keywords: python,spaCy,custom component
Author: Nicolai Bjerre Pedersen
Maintainer: Nicolai Bjerre Pedersen
Requires-Python: >=3.10,<4.0
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Environment :: MacOS X
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: Unix
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: spacy (>=3.4.1,<4.0.0); sys_platform != "darwin"
Requires-Dist: spacy[apple] (>=3.4.1,<4.0.0); sys_platform == "darwin"
Project-URL: Documentation, https://github.com/mr-bjerre/second-opinion-ruler#readme
Project-URL: Repository, https://github.com/mr-bjerre/second-opinion-ruler
Description-Content-Type: text/markdown

# Second Opinion Ruler

`second_opinion_ruler` is a [spaCy](https://spacy.io/) component that extends [`SpanRuler`](https://spacy.io/usage/rule-based-matching#spanruler) with a second opinion. For _each_ pattern you can provide a callback (available in [`registry.misc`](https://spacy.io/api/top-level/#registry)) on the matched [`Span`](https://spacy.io/api/span/#_title) - with this you can decide to discard the match, add additional spans to the match and/or mutate the matched span, e.g. add a parsed `datetime` to a custom attribute.

## Installation

```
pip install second_opinion_ruler
```

## Usage

```python
import spacy
from spacy.tokens import Span
from spacy.util import registry

# create date as custom attribute extension
Span.set_extension("date", default=None)

# add datetime parser to registry.misc
# IMPORTANT: first argument has to be Span and the return type has to be list[Span]
@registry.misc("to_datetime.v1")
def to_datetime(span: Span, format: str, attr: str = "date") -> list[Span]:

    # parse the date
    date = datetime.datetime.strptime(span.text, format)

    # add the parsed date to the custom attribute
    span._.set(attr, date)

    # just return matched span
    return [span]

# load a model
nlp = spacy.blank("en")

# add the second opinion ruler
ruler = nlp.add_pipe("second_opinion_ruler", config={
    "validate": True,
    "annotate_ents": True,
})

# add a pattern with a second opinion handler (on_match)
ruler.add_patterns([
    {
        "label": "DATE",
        "pattern": "21.04.1986",
        "on_match": {
            "id": "to_datetime.v1",
            "kwargs": {"format": "%d.%m.%Y", "attr": "my_date"},
        },
    }
])

doc = nlp("This date 21.04.1986 will be a DATE entity while the structured information will be extracted to `Span._.extructure`")

# verify
assert doc.ents[0]._.date == datetime.datetime(1986, 4, 21)
```

