Metadata-Version: 2.4
Name: vit-captioner
Version: 0.1.2
Summary: A package for extracting keyframes from videos and generating captions using ViT-GPT2 model
Home-page: https://github.com/lachlanchen/VideoCaptionerWithVit
Author: Lachlan Chen
Author-email: lach@lazyingoronlyideas.art
Project-URL: Bug Reports, https://github.com/lachlanchen/VideoCaptionerWithVit/issues
Project-URL: Source, https://github.com/lachlanchen/VideoCaptionerWithVit
Keywords: video,captioning,ai,machine learning,ViT,GPT2,transformers
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Multimedia :: Video
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: opencv-python
Requires-Dist: numpy
Requires-Dist: torch
Requires-Dist: transformers
Requires-Dist: Pillow
Requires-Dist: matplotlib
Requires-Dist: tqdm
Requires-Dist: Katna
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license-file
Dynamic: project-url
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# ViT Captioner

[![PyPI version](https://badge.fury.io/py/vit-captioner.svg)](https://badge.fury.io/py/vit-captioner)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A Python package for extracting keyframes from videos and generating captions using the ViT-GPT2 model.

## Features

- Extract keyframes from videos using Katna or uniform sampling
- Generate captions for images using the ViT-GPT2 model
- Match keyframes with timestamps in a video
- Convert videos to SRT subtitle files with captions
- Visualize keyframes and timeline data
- Performance optimized with smart resource management
- Thread-safe image processing and visualization

## Installation

```bash
pip install vit-captioner
```

## Command Line Usage

### Extract keyframes from a video:
```bash
vit-captioner extract -V /path/to/video.mp4 -N 10 -v
```

### Generate caption for an image:
```bash
vit-captioner caption-image -I /path/to/image.jpg
```

### Convert video to captions:
```bash
vit-captioner caption-video -V /path/to/video.mp4 -N 10 -v
```
The `-v` flag enables verbose output with progress bars.

### Find matching timestamps for keyframes:
```bash
vit-captioner find-timestamps -V /path/to/video.mp4 -K /path/to/keyframes_folder -v
```

## Python API Usage

```python
from vit_captioner.keyframes.extractor import KeyFrameExtractor
from vit_captioner.captioning.image import ImageCaptioner
from vit_captioner.captioning.video import VideoToCaption

# Extract keyframes
extractor = KeyFrameExtractor("/path/to/video.mp4")
extractor.extract_key_frames("/path/to/video.mp4", 10)

# Generate caption for an image
captioner = ImageCaptioner()
caption = captioner.predict_caption("/path/to/image.jpg")

# Convert video to captions
# Note: verbose flag enables progress bars
converter = VideoToCaption("/path/to/video.mp4", num_frames=10, verbose=True)
converter.convert()
```

## Performance Optimizations

- Smart resource management with proper cleanup
- Single model loading for multiple frames (improved memory usage)
- Thread-safe image processing with error fallbacks
- Progress bars for tracking long-running operations
- Limited number of concurrent workers to prevent memory issues

## Requirements

- Python 3.6+
- OpenCV
- PyTorch
- Transformers
- Katna
- Matplotlib
- tqdm

## Source Code

Source code is available on GitHub: [https://github.com/lachlanchen/VideoCaptionerWithVit](https://github.com/lachlanchen/VideoCaptionerWithVit)

## License

MIT
