Metadata-Version: 2.4
Name: vision-loom
Version: 0.2
Summary: Framework to create labeling data from VLM and other larger models for computer vision tasks
Author-email: Prishanga Bora <prishanga1@gmail.com>
Project-URL: Repository, https://github.com/leonbora167/vision-loom
Project-URL: Documentation, https://github.com/leonbora167/vision-loom/blob/main/documentation/PYPI.md
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch<3.0,>=2.1
Requires-Dist: torchvision<1.0,>=0.16
Requires-Dist: transformers<6.0,>=4.40
Requires-Dist: accelerate<2.0,>=0.30
Requires-Dist: tokenizers<1.0,>=0.15
Requires-Dist: safetensors<1.0,>=0.4
Requires-Dist: huggingface_hub<2.0,>=0.20
Requires-Dist: numpy<3.0,>=1.26
Requires-Dist: networkx<4.0,>=3.0
Requires-Dist: sympy<2.0,>=1.12
Requires-Dist: mpmath<2.0,>=1.3
Requires-Dist: pillow<13.0,>=10.0
Requires-Dist: PyYAML<7.0,>=6.0
Requires-Dist: regex>=2023.0
Requires-Dist: tqdm<5.0,>=4.65
Requires-Dist: filelock<4.0,>=3.12
Requires-Dist: fsspec<2027.0,>=2024.0
Requires-Dist: httpx<1.0,>=0.25
Requires-Dist: httpcore<2.0,>=1.0
Requires-Dist: anyio<5.0,>=4.0
Requires-Dist: certifi
Requires-Dist: idna
Dynamic: license-file

# Vision Loom

Leverage Vision Language Models to label images and create training data for other computer vision tasks. 

Vision Loom is an automated labeling framework designed to bridge the gap between high-reasoning Foundation Models and real-time edge deployment. Leverage the "zero-shot" capabilities of Vision-Language Models (VLMs) to generate high-fidelity training data without the manual overhead.

| Model | Task | 
| :--- | :--: | --: |
Grounding Dino Tiny | Object Detection
SAM-2 | Object Segmentation

## Save detection results
  
* Install the library. Install the latest version for the updated features and comments.

```python
pip install vision-loom
```

```python 
from vision_loom.models.grounding_dino import GroundingDINO 
from vision_loom.utils.io import load_images 
```

* Enter a single image or path to a folder containing images
```python
images = "vision_loom/test_notebooks/test_images"
dataloader = load_images(images)
```

* Once the data loader is created, type in the items you want to detect and create the bounding boxes for. If your set of images has multiple objects over multiple images, I recommend giving the prompt for all the objects together to save compute.
```python
prompt = [["car", "person", "book", "bike"]]
``` 

* Run the pipeline.
```python
for img_path, image in tqdm(dataloader):
    results = model.detect(image, prompt, img_path, save_results=True)
``` 


## Roadmap 

- [ ] Add Image Classification Support
- [ ] Options for export to classical SOTA formats like YOLO, COCO etc. 
