Metadata-Version: 2.1
Name: cltrier_prosem
Version: 0.2.0
Summary: ToDo!
Author-email: Simon Münker <muenker@uni-trier.com>
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.10
Requires-Dist: jsonlines>=4.0.0
Requires-Dist: tomli>=2.0.1
Requires-Dist: pyarrow>=13.0.0
Requires-Dist: tqdm>=4.66.1
Requires-Dist: numpy>=1.25.2
Requires-Dist: pandas>=2.1.0
Requires-Dist: torch>=2.0.1
Requires-Dist: transformers>=4.32.1
Requires-Dist: datasets>=1.16.1
Requires-Dist: matplotlib>=3.2.1
Requires-Dist: seaborn>=0.12.2
Requires-Dist: scikit-learn>=1.0.1

# CLTrier ProSem

## Usage

```python
from cltrier_prosem import Pipeline

# init pipeline object (load model, data, trainer)
pipeline = Pipeline({
    'encoder': {
        'model': 'deepset/gbert-base',  # huggingface model slug 
    },
    'dataset': {
        'path': './path/data',  # path to data directory (containing train/test.parquet)
        'text_column': 'text',  # column containing src text
        'label_column': 'label',  # column containing target label
        'label_classes': ['class_1', 'class_2'],  # list of target classes
    },
    'classifier': {
        'hid_size': 512,  # size of classifier perceptron
        'dropout': 0.2,  # dropout value
    },
    'pooler': {
        'form': 'cls',
        # type of pooling, possible values: 
        # 'cls', 'sent_mean', 'subword_{first|last|mean|min|max}'
        # if subword probing used
        'span_column': 'span'
    },
    'trainer': {
        'num_epochs': 5,  # number of training epochs
        'batch_size': 32,  # batch size in both training and evaluation
        'learning_rate': 1e-3,  # trainer learning rate
        'export_path': './path/output',  # output path for logging and results
    },
})

# call pipeline object (training and evaluation)
pipeline()
```

