Metadata-Version: 2.1
Name: regex-inference
Version: 0.2.1
Summary: Regex Inference Engine based on ChatGPT
Home-page: https://github.com/jeffrey82221/regex_inference
Author: jeffreylin
Author-email: jeffrey82221@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8, !=3.11.*
Description-Content-Type: text/markdown
License-File: LICENSE

# Introduction:

Welcome to regex_inference!

regex_inference is a Python package dedicated to making regular expression (regex) inference a breeze. With the power of the ChatGPT model, this package can effortlessly derive regex patterns from a list of strings you provide. 

Here are some of the cool features you can expect:

- **Regex Inference**: Give regex_inference a list of strings and it swiftly outputs a suitable regex for your data, eliminating the need to grapple with complex regex syntax and saving you precious time.

- **Built-In Evaluator**: regex_inference comes equipped with a built-in evaluator that provides a quantitative measure of your regex's performance by calculating precision, recall, and the F1 score in a snap.

- **Multi-Threaded Regex Candidate Generation**: By leveraging Python's multi-threading capabilities, regex_inference can generate multiple regex candidates simultaneously through parallel calls to ChatGPT within permissible rate limits, ensuring efficient and quick regex generation.

- **Post-Generation Evaluation and Selection**: After generating the regex candidates, regex_inference evaluates each one based on their F1 scores against validation patterns and selects the best performing regex, ensuring maximum efficiency and human-like pattern recognition with minimal effort.


Whether you're a machine learning enthusiast, a data scientist, or a Python dev looking to further leverage the power of regex, regex_inference is here to make your life easier. We look forward to seeing the amazing things you'll do with this tool!

# Installation 

You can install regex_inference using pip:

```bash
pip install regex_inference
```
# Configuration

## OpenAI API Key

Before you start using `regex_inference`, you'll need to obtain an OpenAI API key. Here's how you can do it:

1. Follow the guide on this page to get your OpenAI API key: [How to get an OpenAI API Key for ChatGPT](https://www.maisieai.com/help/how-to-get-an-openai-api-key-for-chatgpt)
2. Export the key to your environment:

```bash
export OPENAI_API_KEY=<your_key>
```

# Getting Started with regex_inference

The regex_inference package is a powerful tool for inferring regular expressions (regex) from a set of training patterns. Here's a step-by-step guide on how to use it:

```python
from regex_inference import Evaluator, Inference
import random

# Define the number of training samples
TRAIN_CNT = 200

# Load patterns from a text file
with open('data/version.txt', 'r') as f:
    whole_patterns = f.read().split('\n')

# Randomly select some patterns for training
train_patterns = random.sample(whole_patterns, TRAIN_CNT)

# Use the remaining patterns for evaluation
eval_patterns = list(set(whole_patterns) - set(train_patterns))

# Initialize an Inference object
inferencer = Inference(verbose=False, n_thread=3, engine='fado+ai')

# Generate a regex from a subset of the training patterns, with the rest used for validation
regex = inferencer.run(train_patterns[:100], val_patterns=train_patterns[100:])

# Evaluate the inferred regex
precision, recall, f1 = Evaluator.evaluate(regex, eval_patterns)

# Print the evaluation results
print(f'Precision: {precision}\nRecall: {recall}\nF1 Score: {f1}')
```

In this example, after loading patterns from a text file, we randomly select some of these patterns for training. We further divide the training set into a subset for training and another for validation. The validation patterns (`val_patterns`) guide the selection of the best regex from the candidates generated by ChatGPT. The remaining patterns are used for evaluation.

The `Inference` object is customizable. You can adjust the number of threads (`n_thread`), which corresponds to the number of regex candidates obtained from ChatGPT. The higher the `n_thread` value, the more candidates you get, but note that this also increases the inference cost. You can also select the inference engine (`engine`), with options being `fado+ai` and `ai`.

The `fado+ai` engine minimizes a DFA (Deterministic Finite Automaton) of the training patterns, converts the DFA to a regex, and then uses ChatGPT to generalize to other similar patterns. The `ai` engine sends the training patterns directly to ChatGPT, asking it to produce a regex matching the patterns. The `fado+ai` approach is generally more economical than the `ai` approach, as it sends fewer tokens to ChatGPT.

# Contributing

We welcome your contributions to `regex_inference`! Whether you're improving the documentation, adding new features, reporting bugs, or making other enhancements, your input is greatly appreciated. 

# Contact

If you have any questions, feature requests, or just want to chat, feel free to reach out to me at [jeffrey82221@gmail.com](mailto:jeffrey82221@gmail.com) or open an issue on our GitHub page.


# License

This project is licensed under the terms of the MIT License. For more details, see the [LICENSE](LICENSE) file in the repository.




