Metadata-Version: 2.4
Name: pyrtex
Version: 0.1.2
Summary: A Python library for batch text extraction and processing using Google Cloud Vertex AI
Author-email: CaptainTrojan <your-email@example.com>
License: MIT
Project-URL: Homepage, https://github.com/CaptainTrojan/pyrtex
Project-URL: Repository, https://github.com/CaptainTrojan/pyrtex
Project-URL: Issues, https://github.com/CaptainTrojan/pyrtex/issues
Project-URL: Documentation, https://github.com/CaptainTrojan/pyrtex#readme
Project-URL: Changelog, https://github.com/CaptainTrojan/pyrtex/releases
Keywords: ai,vertex-ai,google-cloud,text-extraction,batch-processing,gemini,pydantic
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pydantic-settings>=2.0.0
Requires-Dist: jinja2>=3.0.0
Requires-Dist: google-cloud-aiplatform>=1.40.0
Requires-Dist: google-cloud-storage>=2.10.0
Requires-Dist: google-cloud-bigquery>=3.11.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-mock>=3.10.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: isort>=5.12.0; extra == "dev"
Requires-Dist: flake8>=6.0.0; extra == "dev"
Requires-Dist: bump2version>=1.0.0; extra == "dev"
Requires-Dist: build>=0.10.0; extra == "dev"
Requires-Dist: twine>=4.0.0; extra == "dev"
Dynamic: license-file

# PyRTex

[![CI](https://github.com/CaptainTrojan/pyrtex/actions/workflows/ci.yml/badge.svg)](https://github.com/CaptainTrojan/pyrtex/actions/workflows/ci.yml)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)

A simple Python library for batch text extraction and processing using Google Cloud Vertex AI.

PyRTex makes it easy to process multiple documents, images, or text snippets with Gemini models and get back structured, type-safe results using Pydantic models.

## ✨ Features

- **🚀 Simple API**: Just 3 steps - configure, submit, get results
- **📦 Batch Processing**: Process multiple inputs efficiently  
- **🔒 Type Safety**: Pydantic models for structured output
- **🎨 Flexible Templates**: Jinja2 templates for prompt engineering
- **☁️ GCP Integration**: Seamless Vertex AI and BigQuery integration
- **🧪 Testing Mode**: Simulate without GCP costs

## 📦 Installation

Install from PyPI (recommended):
```bash
pip install pyrtex
```

Or install from source:
```bash
git clone https://github.com/CaptainTrojan/pyrtex.git
cd pyrtex
pip install -e .
```

For development:
```bash
pip install -e .[dev]
```

## 🚀 Quick Start

```python
from pydantic import BaseModel
from pyrtex import Job

# Define your data structures
class TextInput(BaseModel):
    content: str

class Analysis(BaseModel):
    summary: str
    sentiment: str
    key_points: list[str]

# Create a job
job = Job[Analysis](
    model="gemini-2.0-flash-lite-001",
    output_schema=Analysis,
    prompt_template="Analyze this text: {{ content }}",
    simulation_mode=True  # Set to False for real processing
)

# Add your data
job.add_request("doc1", TextInput(content="Your text here"))
job.add_request("doc2", TextInput(content="Another document"))

# Process and get results
for result in job.submit().wait().results():
    if result.was_successful:
        print(f"Summary: {result.output.summary}")
        print(f"Sentiment: {result.output.sentiment}")
    else:
        print(f"Error: {result.error}")
```

## 📋 Core Workflow

PyRTex uses a simple 3-step workflow:

### 1. Configure & Add Data
```python
job = Job[YourSchema](model="gemini-2.0-flash-lite-001", ...)
job.add_request("key1", YourModel(data="value1"))
job.add_request("key2", YourModel(data="value2"))
```

### 2. Submit & Wait  
```python
job.submit().wait()  # Can be chained
```

### 3. Get Results
```python
for result in job.results():
    if result.was_successful:
        # Use result.output (typed!)
    else:
        # Handle result.error
```

## ⚙️ Configuration

For production use, set your GCP project:

```bash
export GOOGLE_PROJECT_ID="your-project-id"
```

Then use `simulation_mode=False` for real processing.

## 📚 Examples

The `examples/` directory contains complete working examples:

```bash
cd examples

# Generate sample files
python generate_sample_data.py

# Extract contact info from business cards
python 01_simple_text_extraction.py

# Parse product catalogs  
python 02_pdf_product_parsing.py

# Extract invoice data from PDFs
python 03_image_description.py
```

### Example Use Cases

- **📇 Business Cards**: Extract contact information
- **📄 Documents**: Process PDFs, images (PNG, JPEG)  
- **🛍️ Product Catalogs**: Parse pricing and inventory
- **🧾 Invoices**: Extract structured financial data
- **📊 Batch Processing**: Handle multiple files efficiently

## 🧪 Development

### Running Tests

```bash
# All tests (mocked, safe)
./test_runner.sh

# Specific test types
./test_runner.sh --unit
./test_runner.sh --integration
./test_runner.sh --flake

# Real GCP tests (costs money!)
./test_runner.sh --real --project-id your-project-id
```

Windows users:
```cmd
test_runner.bat --unit
test_runner.bat --flake
```

### Code Quality

- **flake8**: Linting
- **black**: Code formatting  
- **isort**: Import sorting
- **pytest**: Testing with coverage

## 🤝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Run tests: `./test_runner.sh`
5. Submit a pull request

## 📄 License

MIT License - see [LICENSE](LICENSE) for details.

## 🆘 Support

- **Issues**: [GitHub Issues](https://github.com/CaptainTrojan/pyrtex/issues)
- **Examples**: Check the `examples/` directory
- **Testing**: Use `simulation_mode=True` for development
