Metadata-Version: 2.3
Name: image-manipulation-datasets
Version: 0.6.5
Summary: A collection of image manipulation dataset classes implemented in PyTorch
Author: Spencer Cain
Author-email: cainspencerm@protonmail.com
Requires-Python: >=3.9
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: numpy (>=1.22.4,<2.0.0)
Requires-Dist: pillow (>=9.1.1,<10.0.0)
Requires-Dist: torch (>=1.11.0,<2.0.0)
Description-Content-Type: text/markdown

# Image Manipulation Datasets (IMDS)

This Python package provides PyTorch-compatible dataset classes for common image manipulation datasets used in digital forensics and deepfake detection research.

## Supported Datasets

- **CASIA 2.0** - Forgery classification dataset with 4,795 images
- **Defacto** - Collection of manipulation datasets:
  - Copy/Move (~19,000 forgeries)
  - Splicing (~105,000 forgeries) 
  - Inpainting (~25,000 forgeries)
- **Coverage** - Copy-move forgery database with similar genuine objects
- **IMD2020** - Real-life manipulated images from the Internet (2,010 images)

## Installation

```bash
pip install git+https://github.com/cainspencerm/image-manipulation-datasets.git@0.6
```

## Quick Start

```python
from imds import casia
from torch.utils.data import DataLoader

# Load any dataset
dataset = casia.CASIA2(data_dir='data/CASIA2.0', split='train')
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

for images, masks in dataloader:
    # images: torch.Tensor shape (batch_size, 3, H, W)
    # masks: torch.Tensor shape (batch_size, 1, H, W) 
    pass
```

## Documentation

For comprehensive API documentation, usage examples, and advanced features, see:

**[📖 API Documentation](API_DOCUMENTATION.md)**

The documentation includes:
- Complete API reference for all dataset classes
- Usage examples and common patterns
- Directory structure requirements for each dataset
- Performance optimization tips
- Error handling guidelines

## Sample Quality

Datasets are not always perfect. Of the available datasets, COVERAGE, CASIA 2, and Defacto Splicing had images and masks that didn't match in size, though they have been verified as pairs. For this reason, the dataset classes resize the masks to the size of the original image, with the hopes that the masks line up correctly with the image. This is unverified as it would require manually verifying each of the over 110,000 image and mask pairs.

