Metadata-Version: 2.4
Name: vsegments
Version: 0.1.0
Summary: Visual segmentation and bounding box detection using Google Gemini AI
Author-email: Marco Kotrotsos <your.email@example.com>
Maintainer-email: Marco Kotrotsos <your.email@example.com>
License: MIT
Project-URL: Homepage, https://github.com/yourusername/vsegments
Project-URL: Documentation, https://github.com/yourusername/vsegments#readme
Project-URL: Repository, https://github.com/yourusername/vsegments.git
Project-URL: Issues, https://github.com/yourusername/vsegments/issues
Keywords: computer-vision,segmentation,bounding-boxes,object-detection,gemini,ai,machine-learning,image-processing
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Image Recognition
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: google-genai>=1.16.0
Requires-Dist: pillow>=9.0.0
Requires-Dist: numpy>=1.20.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: flake8>=6.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: build>=0.10.0; extra == "dev"
Requires-Dist: twine>=4.0.0; extra == "dev"
Dynamic: license-file

# vsegments

**Visual segmentation and bounding box detection using Google Gemini AI**

`vsegments` is a powerful Python library and CLI tool that leverages Google's Gemini AI models to perform advanced visual segmentation and object detection on images. It provides an easy-to-use interface for detecting bounding boxes and generating segmentation masks with high accuracy.

[![PyPI version](https://badge.fury.io/py/vsegments.svg)](https://badge.fury.io/py/vsegments)
[![Python Support](https://img.shields.io/pypi/pyversions/vsegments.svg)](https://pypi.org/project/vsegments/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

## Features

- 🎯 **Bounding Box Detection**: Automatically detect and label objects in images
- 🎨 **Segmentation Masks**: Generate precise segmentation masks for identified objects
- 🖼️ **Visualization**: Beautiful visualization with customizable colors, fonts, and transparency
- 🛠️ **CLI Tool**: Powerful command-line interface for batch processing
- 📦 **Library**: Clean Python API for integration into your projects
- 🚀 **Multiple Models**: Support for various Gemini models (Flash, Pro, etc.)
- ⚙️ **Customizable**: Fine-tune prompts, system instructions, and output settings
- 📊 **JSON Export**: Export detection results in structured JSON format

## Installation

### From PyPI (Recommended)

```bash
pip install vsegments
```

### From Source

```bash
git clone https://github.com/yourusername/vsegments.git
cd vsegments
pip install -e .
```

### Development Installation

```bash
pip install -e ".[dev]"
```

## Quick Start

### Prerequisites

You need a Google API key to use this library. Get one from [Google AI Studio](https://aistudio.google.com/app/apikey).

Set your API key as an environment variable:

```bash
export GOOGLE_API_KEY="your-api-key-here"
```

### CLI Usage

#### Basic Bounding Box Detection

```bash
vsegments -f image.jpg
```

#### Save Output Image

```bash
vsegments -f image.jpg -o output.jpg
```

#### Perform Segmentation

```bash
vsegments -f image.jpg --segment -o segmented.jpg
```

#### Custom Prompt

```bash
vsegments -f image.jpg -p "Find all people wearing red shirts"
```

#### Export JSON Results

```bash
vsegments -f image.jpg --json results.json
```

#### Add Custom Instructions (Grounding)

```bash
vsegments -f image.jpg --instructions "Focus only on objects larger than 100 pixels"
```

#### Use a Different Model

```bash
vsegments -f image.jpg -m gemini-2.5-pro
```

#### Customize Visualization

```bash
vsegments -f image.jpg --line-width 6 --font-size 16 --alpha 0.5
```

### Library Usage

#### Basic Detection

```python
from vsegments import VSegments

# Initialize
vs = VSegments(api_key="your-api-key")

# Detect bounding boxes
result = vs.detect_boxes("image.jpg")

# Print results
print(f"Found {len(result.boxes)} objects")
for box in result.boxes:
    print(f"  - {box.label}")

# Visualize
vs.visualize("image.jpg", result, output_path="output.jpg")
```

#### Advanced Detection with Custom Settings

```python
from vsegments import VSegments

# Initialize with custom settings
vs = VSegments(
    api_key="your-api-key",
    model="gemini-2.5-pro",
    temperature=0.7,
    max_objects=50
)

# Detect with custom prompt and instructions
result = vs.detect_boxes(
    "image.jpg",
    prompt="Find all vehicles in the image",
    custom_instructions="Focus on cars, trucks, and motorcycles. Ignore bicycles."
)

# Access individual boxes
for box in result.boxes:
    print(f"{box.label}: [{box.x1}, {box.y1}] -> [{box.x2}, {box.y2}]")
```

#### Segmentation

```python
from vsegments import VSegments

vs = VSegments(api_key="your-api-key")

# Perform segmentation
result = vs.segment("image.jpg")

# Visualize with custom settings
vs.visualize(
    "image.jpg",
    result,
    output_path="segmented.jpg",
    line_width=6,
    font_size=18,
    alpha=0.6
)
```

#### Working with Results Programmatically

```python
from vsegments import VSegments
from PIL import Image

vs = VSegments(api_key="your-api-key")
result = vs.detect_boxes("image.jpg")

# Load original image
img = Image.open("image.jpg")
width, height = img.size

# Process each detected object
for box in result.boxes:
    # Get absolute coordinates
    abs_x1, abs_y1, abs_x2, abs_y2 = box.to_absolute(width, height)
    
    # Crop object
    cropped = img.crop((abs_x1, abs_y1, abs_x2, abs_y2))
    cropped.save(f"{box.label}.jpg")
```

## CLI Reference

### Required Arguments

- `-f, --file IMAGE`: Path to input image file

### Mode Options

- `--segment`: Perform segmentation instead of bounding box detection

### API Options

- `--api-key KEY`: Google API key (default: `GOOGLE_API_KEY` env var)
- `-m, --model MODEL`: Model name (default: `gemini-flash-latest`)
- `--temperature TEMP`: Sampling temperature 0.0-1.0 (default: 0.5)
- `--max-objects N`: Maximum objects to detect (default: 25)

### Prompt Options

- `-p, --prompt TEXT`: Custom detection prompt
- `--instructions TEXT`: Additional system instructions for grounding

### Output Options

- `-o, --output FILE`: Save visualized output to file
- `--json FILE`: Export results as JSON
- `--no-show`: Don't display the output image
- `--raw`: Print raw API response

### Visualization Options

- `--line-width N`: Bounding box line width (default: 4)
- `--font-size N`: Label font size (default: 14)
- `--alpha A`: Mask transparency 0.0-1.0 (default: 0.7)
- `--max-size N`: Maximum image dimension for processing (default: 1024)

### Other Options

- `-v, --version`: Show version information
- `-q, --quiet`: Suppress informational output
- `-h, --help`: Show help message

## API Reference

### `VSegments` Class

#### Constructor

```python
VSegments(
    api_key: Optional[str] = None,
    model: str = "gemini-flash-latest",
    temperature: float = 0.5,
    max_objects: int = 25
)
```

#### Methods

##### `detect_boxes()`

Detect bounding boxes in an image.

```python
detect_boxes(
    image_path: Union[str, Path],
    prompt: Optional[str] = None,
    custom_instructions: Optional[str] = None,
    max_size: int = 1024
) -> SegmentationResult
```

##### `segment()`

Perform segmentation on an image.

```python
segment(
    image_path: Union[str, Path],
    prompt: Optional[str] = None,
    max_size: int = 1024
) -> SegmentationResult
```

##### `visualize()`

Visualize detection/segmentation results.

```python
visualize(
    image_path: Union[str, Path],
    result: SegmentationResult,
    output_path: Optional[Union[str, Path]] = None,
    show: bool = True,
    line_width: int = 4,
    font_size: int = 14,
    alpha: float = 0.7
) -> Image.Image
```

### Data Models

#### `BoundingBox`

```python
@dataclass
class BoundingBox:
    label: str
    y1: int  # Normalized 0-1000
    x1: int
    y2: int
    x2: int
    
    def to_absolute(self, img_width: int, img_height: int) -> tuple
```

#### `SegmentationResult`

```python
@dataclass
class SegmentationResult:
    boxes: List[BoundingBox]
    masks: Optional[List[SegmentationMask]] = None
    raw_response: Optional[str] = None
```

## Examples

### Batch Processing

```python
import os
from vsegments import VSegments

vs = VSegments(api_key="your-api-key")

# Process all images in a folder
for filename in os.listdir("images"):
    if filename.endswith((".jpg", ".png")):
        print(f"Processing {filename}...")
        result = vs.detect_boxes(f"images/{filename}")
        vs.visualize(
            f"images/{filename}",
            result,
            output_path=f"output/{filename}",
            show=False
        )
```

### Custom Object Detection

```python
from vsegments import VSegments

vs = VSegments(api_key="your-api-key")

# Detect specific objects
result = vs.detect_boxes(
    "street.jpg",
    prompt="Detect all traffic signs and signals",
    custom_instructions="Include stop signs, traffic lights, and speed limit signs"
)

# Filter results
traffic_signs = [box for box in result.boxes if "sign" in box.label.lower()]
print(f"Found {len(traffic_signs)} traffic signs")
```

## Deployment to PyPI

### 1. Prepare Your Package

Update version in `vsegments/__version__.py` and ensure all tests pass:

```bash
pytest tests/
```

### 2. Build Distribution

```bash
python -m build
```

This creates files in `dist/`:
- `vsegments-0.1.0-py3-none-any.whl` (wheel)
- `vsegments-0.1.0.tar.gz` (source)

### 3. Test on TestPyPI (Optional)

```bash
python -m twine upload --repository testpypi dist/*
```

### 4. Upload to PyPI

```bash
python -m twine upload dist/*
```

### 5. Verify Installation

```bash
pip install vsegments
vsegments --version
```

## Supported Models

- `gemini-flash-latest` (default, fastest)
- `gemini-2.0-flash`
- `gemini-2.5-flash-lite`
- `gemini-2.5-flash`
- `gemini-2.5-pro` (best quality, slower)

**Note**: Segmentation features require 2.5 models or later.

## Requirements

- Python 3.8+
- google-genai >= 1.16.0
- pillow >= 9.0.0
- numpy >= 1.20.0

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- Built using [Google Gemini AI](https://ai.google.dev/)
- Inspired by the [Google AI Cookbook](https://github.com/google-gemini/cookbook)

## Support

- **Issues**: [GitHub Issues](https://github.com/yourusername/vsegments/issues)
- **Documentation**: [GitHub README](https://github.com/yourusername/vsegments#readme)

## Changelog

See [CHANGELOG.md](CHANGELOG.md) for version history.

---

Made with ❤️ by Marco Kotrotsos
