Metadata-Version: 2.1
Name: extr-ds
Version: 0.0.3
Summary: Library to quickly build basic datasets for Named Entity Recognition (NER) and Relation Extraction (RE) Machine Learning tasks.
Home-page: https://github.com/dpasse/extr-ds
License: UNKNOWN
Platform: UNKNOWN
Description-Content-Type: text/markdown
Requires-Dist: extr (==0.0.5)

# extr-ds
> Library to quickly build basic datasets for Named Entity Recognition (NER) and Relation Extraction (RE) Machine Learning tasks.

<br />

## Install

```
pip install extr-ds
```

## Example

```python
text = 'Ted Johnson is a pitcher. Ted went to my school.'
```

### 1. Label Entities for Named-Entity Recognition Task (NER)

```python
from extr import RegEx, RegExLabel, EntityExtactor
from extr-ds import IOB

entity_extractor = EntityExtactor([
    RegExLabel('PERSON', [
        RegEx([r'(ted\s+johnson|ted)'], re.IGNORECASE)
    ]),
    RegExLabel('POSITION', [
        RegEx([r'pitcher'], re.IGNORECASE)
    ]),
])

sentence_tokenizer = ## 3rd party tokenizer ##
labels = IOB(sentence_tokenizer, entity_extractor).label(text)

## labels ==  [
##     ['B-PERSON', 'I-PERSON', 'O', 'O', 'B-POSITION', 'O'],
##     ['B-PERSON', 'O', 'O', 'O', 'O', 'O']
## ]
```

### 2. Verify Actual vs Model

```python
from extr-ds.merges import check_for_differences

differences_in_labels = check_for_differences(
    ['B-PERSON', 'I-PERSON', 'O', 'O', 'B-POSITION', 'O'],
    ['B-PERSON', 'O', 'O', 'O', 'B-POSITION', 'O']
)

## differences_in_labels.has_diffs == True
## differences_in_labels.diffs_between_labels = [1]
```


