Metadata-Version: 2.1
Name: tailwiz
Version: 0.0.9
Home-page: https://github.com/timothydai/tailwiz
Author: Timothy Dai
Author-email: timdai@stanford.edu
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
License-File: LICENSE

# Text Labeling AI Wizard (tailwiz)

`tailwiz` is an AI-powered tool for labeling text. It has three main capabilties: classifying text (`tailwiz.classify`), parsing text given context and prompts (`tailwiz.parse`), and generating text given prompts (`tailwiz.generate`).

## Quickstart

Install `tailwiz` by copying and pasting the following into command line:

```
python -m pip install tailwiz
```
Then run the following in a Python environment for a quick example of text classification:

```python
import tailwiz
import pandas as pd

prelabeled_text = pd.DataFrame(
    [
        ['Love you to the moon', 'nice'],
        ['I hate you', 'mean'],
        ['Have a great day', 'nice'],
    ],
    columns=['text', 'label'],
)
text_to_label = pd.DataFrame(
    ['You are the best!', 'You make me sick'],
    columns=['text'],
)
results = tailwiz.classify(
    text_to_label=text_to_label,
    prelabeled_text=prelabeled_text,
)
print(results)
```

## Installation

Install `tailwiz` through `pip`:

```
python -m pip install tailwiz
```

## Usage

In this section, we outline the three main functions of `tailwiz` and provide examples.


### <code>tailwiz.classify<i>(text_to_label, prelabeled_text=None, output_metrics=False)</i></code>

Given text, classify the text.
#### Parameters:
- `text_to_label` : _pandas.DataFrame_. Data structure containing text to classify. Must contain a string column named `text`.
- `prelabeled_text` : _pandas.DataFrame, default None_. Pre-labeled text to enhance the performance of the classification task. Must contain a string column for the classified text named `text` and a column for the labels named `label`.
- `output_metrics` : _bool, default False_. Whether to output `performance_estimate` together with results in a tuple.

#### Returns:
- `results` : _pandas.DataFrame_. A copy of `text_to_label` with a new column, `label_from_tailwiz`, containing classification results.
- `performance_estimate` : _Dict[str, float]_. Dictionary of metric name to metric value mappings. Included together with results in a tuple if `output_metrics` is True. Uses prelabeled_text to give an estimate of the accuracy of the classification. One vs. all metrics are given for multiclass classification.

#### Example:

```python
import tailwiz
import pandas as pd

prelabeled_text = pd.DataFrame(
    [
        ['Love you to the moon', 'nice'],
        ['I hate you', 'mean'],
        ['Have a great day', 'nice'],
    ],
    columns=['text', 'label'],
)
text_to_label = pd.DataFrame(
    ['You are the best!', 'You make me sick'],
    columns=['text'],
)
results = tailwiz.classify(
    text_to_label=text_to_label,
    prelabeled_text=prelabeled_text,
)
print(results)
```

### <code>tailwiz.parse<i>(text_to_label, prelabeled_text=None, output_metrics=False)</i></code>

Given a prompt and a context, parse the answer from the context.
#### Parameters:
- `text_to_label` : _pandas.DataFrame_. Data containing prompts and contexts from which answers will be parsed. Must contain a string column for the context named `context` and a string column for the prompt named `prompt`.
- `prelabeled_text` : _pandas.DataFrame, default None_. Pre-labeled text to enhance the performance of the parsing task. Must contain a string column for the context named `context`, a string column for the prompt named `prompt`, and a string column for the label named `label`.
- `output_metrics` : _bool, default False_. Whether to output `performance_estimate` together with results in a tuple.

#### Returns:
- `results` : _pandas.DataFrame_. A copy of `text_to_label` with a new column, `label_from_tailwiz`, containing parsed results.
- `performance_estimate` : _Dict[str, float]_. Dictionary of metric name to metric value mappings. Included together with results in a tuple if `output_metrics` is True. Uses prelabeled_text to give an estimate of the accuracy of the parsing job.

#### Example:
```python
import tailwiz
import pandas as pd

prelabeled_text = pd.DataFrame(
    [
        ['Extract the number.', 'Noon is twelve oclock', 'twelve'],
        ['Extract the number.', '10 jumping jacks', '10'],
        ['Extract the number.', 'I have 3 eggs', '3'],
    ],
    columns=['prompt', 'context', 'label'],
)
text_to_label = pd.DataFrame(
    [['Extract the number.', 'Figure 8']],
    columns=['prompt', 'context'],
)
results = tailwiz.parse(
    text_to_label=text_to_label,
    prelabeled_text=prelabeled_text,
)
print(results)
```


### <code>tailwiz.generate<i>(text_to_label, prelabeled_text=None, output_metrics=False)</i></code>

Given a prompt, generate an answer.
#### Parameters:
- `text_to_label` : _pandas.DataFrame_. Data structure containing prompts for which answers will be generated. Must contain a string column for the prompt named `prompt`.
- `prelabeled_text` : _pandas.DataFrame, default None_. Pre-labeled text to enhance the performance of the text generation task. Must contain a string column for the prompt named `prompt` and a string column for the label named `label`.
- `output_metrics` : _bool, default False_. Whether to output `performance_estimate` together with results in a tuple.

#### Returns:
- `results` : _pandas.DataFrame_. A copy of `text_to_label` with a new column, `label_from_tailwiz`, containing generated results.
- `performance_estimate` : _Dict[str, float]_. Dictionary of metric name to metric value mappings. Included together with results in a tuple if `output_metrics` is True. Uses prelabeled_text to give an estimate of the accuracy of the text generation job.

#### Example:
```python
import tailwiz
import pandas as pd

prelabeled_text = pd.DataFrame(
    [
        ['Is this sentence Happy or Sad? I love puppies!', 'Happy'],
        ['Is this sentence Happy or Sad? I do not like you at all.', 'Sad'],
    ],
    columns=['prompt', 'label']
)
text_to_label = pd.DataFrame(
    ['Is this sentence Happy or Sad? I am crying my eyes out.'],
    columns=['prompt']
)
results = tailwiz.generate(
    text_to_label=text_to_label,
    prelabeled_text=prelabeled_text,
)
```

## Templates (Notebooks)

Use these Jupyter Notebook examples as templates to help load your data and run any of the three `tailwiz` functions:
- For an example of `tailwiz.classify`, see [`examples/classify.ipynb`](https://github.com/timothydai/tailwiz/blob/main/examples/classify.ipynb)
- For an example of `tailwiz.parse`, see [`examples/parse.ipynb`](https://github.com/timothydai/tailwiz/blob/main/examples/parse.ipynb)
- For an example of `tailwiz.generate`, see [`examples/generate.ipynb`](https://github.com/timothydai/tailwiz/blob/main/examples/generate.ipynb)
