Metadata-Version: 2.1
Name: regex-inference
Version: 0.1.3
Summary: Regex Inference Engine based on ChatGPT
Home-page: https://github.com/jeffrey82221/regex_inference
Author: jeffreylin
Author-email: jeffrey82221@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8, !=3.11.*
Description-Content-Type: text/markdown
License-File: LICENSE

# Introduction:

Welcome to regex_inference!

regex_inference is a Python package dedicated to making regular expression (regex) inference a breeze. With the power of the ChatGPT model, this package can effortlessly derive regex patterns from a list of strings you provide. 

Here are some of the cool features you can expect:

- **Regex Inference**: Give regex_inference a list of strings and it swiftly outputs a suitable regex for your data, eliminating the need to grapple with complex regex syntax and saving you precious time.

- **Built-In Evaluator**: regex_inference comes equipped with a built-in evaluator that provides a quantitative measure of your regex's performance by calculating precision, recall, and the F1 score in a snap.

- **Multi-Threaded Regex Candidate Generation**: By leveraging Python's multi-threading capabilities, regex_inference can generate multiple regex candidates simultaneously through parallel calls to ChatGPT within permissible rate limits, ensuring efficient and quick regex generation.

- **Post-Generation Evaluation and Selection**: After generating the regex candidates, regex_inference evaluates each one based on their F1 scores against validation patterns and selects the best performing regex, ensuring maximum efficiency and human-like pattern recognition with minimal effort.


Whether you're a machine learning enthusiast, a data scientist, or a Python dev looking to further leverage the power of regex, regex_inference is here to make your life easier. We look forward to seeing the amazing things you'll do with this tool!

# Installation 

You can install regex_inference using pip:

```bash
pip install regex_inference
```
# Configuration

## OpenAI API Key

Before you start using `regex_inference`, you'll need to obtain an OpenAI API key. Here's how you can do it:

1. Follow the guide on this page to get your OpenAI API key: [How to get an OpenAI API Key for ChatGPT](https://www.maisieai.com/help/how-to-get-an-openai-api-key-for-chatgpt)
2. Export the key to your environment:

```bash
export OPENAI_API_KEY=<your_key>
```

# Getting Started with regex_inference

The regex_inference package is a powerful tool for inferring regular expressions (regex) from a set of training patterns. Here's a step-by-step guide on how to use it:

```python
from regex_inference import Evaluator, Inference
import random

# Define the number of training samples
TRAIN_CNT = 200

# Load patterns from a text file
with open('data/version.txt', 'r') as f:
    whole_patterns = f.read().split('\n')

# Randomly select some patterns for training
train_patterns = random.sample(whole_patterns, TRAIN_CNT)

# Use the remaining patterns for evaluation
eval_patterns = list(set(whole_patterns) - set(train_patterns))

# Initialize an Inference object
inferencer = Inference(verbose=False, n_thread=3, engine='fado+ai')

# Generate a regex from a subset of the training patterns, with the rest used for validation
regex = inferencer.run(train_patterns[:100], val_patterns=train_patterns[100:])

# Evaluate the inferred regex
precision, recall, f1 = Evaluator.evaluate(regex, eval_patterns)

# Print the evaluation results
print(f'Precision: {precision}\nRecall: {recall}\nF1 Score: {f1}')
```

In this example, after loading patterns from a text file, we randomly select some of these patterns for training. We further divide the training set into a subset for training and another for validation. The validation patterns (`val_patterns`) guide the selection of the best regex from the candidates generated by ChatGPT. The remaining patterns are used for evaluation.

The `Inference` object is customizable. You can adjust the number of threads (`n_thread`), which corresponds to the number of regex candidates obtained from ChatGPT. The higher the `n_thread` value, the more candidates you get, but note that this also increases the inference cost. You can also select the inference engine (`engine`), with options being `fado+ai` and `ai`.

The `fado+ai` engine minimizes a DFA (Deterministic Finite Automaton) of the training patterns, converts the DFA to a regex, and then uses ChatGPT to generalize to other similar patterns. The `ai` engine sends the training patterns directly to ChatGPT, asking it to produce a regex matching the patterns. The `fado+ai` approach is generally more economical than the `ai` approach, as it sends fewer tokens to ChatGPT.

# Contributing

We welcome your contributions to `regex_inference`! Whether you're improving the documentation, adding new features, reporting bugs, or making other enhancements, your input is greatly appreciated. 

# Contact

If you have any questions, feature requests, or just want to chat, feel free to reach out to me at [jeffrey82221@gmail.com](mailto:jeffrey82221@gmail.com) or open an issue on our GitHub page.

If you already have an MIT License file in your codebase, you can simply reference it in your README. Here's how you can do it:


# License

This project is licensed under the terms of the MIT License. For more details, see the [LICENSE](LICENSE) file in the repository.




