Metadata-Version: 2.1
Name: topic-autolabel
Version: 0.1.2
Summary: Automatic topic labeling using LLMs
Author-email: Anthony Susevski <asusevski@gmail.com>
Project-URL: Repository, https://github.com/asusevski/topic-autolabel
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pytest
Requires-Dist: pandas
Requires-Dist: datasets
Requires-Dist: scikit-learn
Requires-Dist: instructor
Requires-Dist: torch
Requires-Dist: transformers
Provides-Extra: dev
Requires-Dist: black==24.10.0; extra == "dev"
Requires-Dist: ruff==0.8.0; extra == "dev"
Requires-Dist: isort==5.13.2; extra == "dev"
Requires-Dist: pyright==1.1.389; extra == "dev"

# topic-autolabel
Given text data, generates labels to classify the data into a set number of topics completely unsupervised.

---
## Example usage:

First, install the package with pip: ```pip install topic_autolabel```

```
# Labelling with supplied labels
from topic_autolabel import process_file
import pandas as pd

df = pd.read_csv('path/to/file')
candidate_labels = ["positive", "negative"]

# labelling column "review" with "positive" or "negative"
new_df = process_file(
    df=df,
    text_column="review",
    candidate_labels=candidate_labels,
    model_name="meta-llama/Llama-3.1-8B-Instruct" # default model to pull from huggingface hub
)
```

Alternatively, one can label text completely unsupervised by not providing the ```candidate_labels``` argument

```
from topic_autolabel import process_file
import pandas as pd

df = pd.read_csv('path/to/file')

# labelling column "review" with open-ended labels (best results when dataset talks about many topics)
new_df = process_file(
    df=df,
    text_column="review",
    model_name="meta-llama/Llama-3.1-8B-Instruct",
    num_labels=5 # generate up to 5 labels for each of the rows
)
```
