Metadata-Version: 2.3
Name: promptwright
Version: 1.0.0
Summary: A tool for generating and managing prompts for local LLMs using Ollama
License-File: LICENSE
Requires-Python: >=3.11
Requires-Dist: datasets==3.0.2
Requires-Dist: huggingface-hub==0.26.0
Requires-Dist: requests>=2.25.0
Requires-Dist: tqdm>=4.65.0
Provides-Extra: dev
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest-mock>=3.10.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: requests-mock>=1.11.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Description-Content-Type: text/markdown

# Promptwright - Synthetic Dataset Generation Library

[![Tests](https://github.com/StacklokLabs/promptwright/actions/workflows/test.yml/badge.svg)](https://github.com/StacklokLabs/promptwright/actions/workflows/test.yml)
[![Python Version](https://img.shields.io/pypi/pyversions/promptwright.svg)](https://pypi.org/project/promptwright/)

![promptwright-cover](https://github.com/user-attachments/assets/5e345bda-df66-474b-90e7-f488d8f89032)

Promptwright is a Python library from [Stacklok](https://stacklok.com) designed for generating large synthetic 
datasets using a local LLM. The library offers a flexible and easy-to-use set of interfaces, enabling users
the ability to generate prompt led synthetic datasets.

Promptwright was inspired by the [redotvideo/pluto](https://github.com/redotvideo/pluto),
in fact it started as fork, but ended up largley being a re-write, to allow dataset generation
against a local LLM model.

The library interfaces with Ollama, making it easy to just pull a model and run
Promptwright.

## Features

- **Local LLM Client Integration**: Interact with Ollama based models
- **Configurable Instructions and Prompts**: Define custom instructions and system prompts
- **Push to Hugging Face**: Push the generated dataset to Hugging Face Hub.

## Getting Started

### Prerequisites

- Python 3.11+
- `promptwright` library installed
- Ollama CLI installed and running (see [Ollama Installation](https://ollama.com/)
- A Model pulled via Ollama (see [Model Compatibility](#model-compatibility))

### Installation

To install the prerequisites, you can use the following commands:

```bash
pip install promptwright
ollama serve
ollama pull {model_name} # whichever model you want to use
```

### Example Usage

There are a few examples in the `examples` directory that demonstrate how to use
the library to generate different topic based datasets.

### Running an Example

To run an example:

1. Ensure you have started Ollama by running `ollama serve`.
2. Verify that the required model is downloaded (e.g. `llama3.2:latest`).
4. Set the `model_name` in the chosen example file to the model you have downloaded.

  ```python

      tree = TopicTree(
        args=TopicTreeArguments(
            root_prompt="Creative Writing Prompts",
            model_system_prompt=system_prompt,
            tree_degree=5, # Increase degree for more prompts
            tree_depth=4, # Increase depth for more prompts
            temperature=0.9, # Higher temperature for more creative variations
            model_name="ollama/llama3" # Set the model name here
        )
      )
      engine = DataEngine(
        args=EngineArguments(
            instructions="Generate creative writing prompts and example responses.",
            system_prompt="You are a creative writing instructor providing writing prompts and example responses.",
            model_name="ollama/llama3",
            temperature=0.9,
            max_retries=2,
  ```
5. Run your chosen example file:
   ```bash
   python example/creative_writing.py
   ```
6. The generated dataset will be saved to a JSONL file to whatever is set within  `dataset.save()`.

### Prompt Output Examples

```json
{
  "messages": [
    {
      "role": "system",
      "content": "You are tasked with designing an immersive virtual reality experience that transports users to a fantastical world of wonder."
    },
    {
      "role": "user",
      "content": "Create a descriptive passage about a character discovering their hidden talents."
    },
    {
      "role": "assistant",
      "content": "As she stared at the canvas, Emma's fingers hovered above the paintbrushes, as if hesitant to unleash the colors that had been locked within her. The strokes began with bold abandon, swirling blues and greens merging into a mesmerizing dance of light and shadow. With each passing moment, she felt herself becoming the art – her very essence seeping onto the canvas like watercolors in a spring storm. The world around her melted away, leaving only the vibrant symphony of color and creation."
    }
  ]
}
```

## Model Compatibility

The library should work with most LLM models. It has been tested with the
following models so far:

- **Mistral**
- **LLaMA3**
--**Qwen2.5**

## Unpredictable Behavior

The library is designed to generate synthetic data based on the prompts and instructions
provided. The quality of the generated data is dependent on the quality of the prompts
and the model used. The library does not guarantee the quality of the generated data.

Large Language Models can sometimes generate unpredictable or inappropriate
content and the authors of this library are not responsible for the content
generated by the models. We recommend reviewing the generated data before using it
in any production environment.

Large Language Models also have the potential to fail to stick with the behavior
defined by the prompt around JSON formatting, and may generate invalid JSON. This
is a known issue with the underlying model and not the library. We handle these
errors by retrying the generation process and filtering out invalid JSON. The 
failure rate is low, but it can happen. We report on each failure within a final
summary.

## Contributing

If something here could be improved, please open an issue or submit a pull request.

### License

This project is licensed under the Apache 2 License. See the `LICENSE` file for more details.
