📸 Photo Stack Finder Orchestrator

Automated pipeline for finding and reviewing duplicate photos

â„šī¸ First Time Here? Read This First

What This Tool Does

Photo Stack Finder finds photos that originate from the same source image: byte-identical files, different resolutions (low-res vs. high-res), edited versions vs. originals, rotation variants, format conversions (JPEG vs. HEIC), and cloud sync duplicates like IMG_1234.jpg and IMG_1234(1).jpg.

Note: This tool does not find burst shots or similar photos from different moments (those are separate images, not duplicates).

Before You Start: Prepare Your Photos

  1. Export from cloud service (if needed):
    • Google Photos: Use Google Takeout → Download all archives → Extract to a folder
    • iCloud: Download via iCloud.com or iCloud for Windows
    • OneDrive/Dropbox: Ensure fully synced to your computer
  2. Locate your photo directory - Where your photos are stored (can include subdirectories)
  3. Verify disk space - Need ~5-10% of photo collection size for analysis cache
  4. Backup (recommended) - Ensure you have a backup before deleting duplicates

Quick Start Checklist

  • ✅ Photos exported from cloud service (or already on computer)
  • ✅ Know the path to your photo directory
  • ✅ Have sufficient disk space (~5-10% of photo collection)
  • ✅ Have a backup (recommended)

What To Expect

  • Processing time: ~30-60 minutes for 10,000 photos (depends on collection size and CPU)
  • Review time: ~30-60 minutes to review duplicate groups
  • Storage savings: Often 10-50% reduction depending on duplicate rate

📖 Detailed Guide: See GETTING_STARTED.md for step-by-step instructions including Google Takeout export.

Configuration

Directory containing your photos to analyze
Where to store analysis results (defaults to sibling of source dir)
âš™ī¸ Advanced Options

Processing

Parallel processing workers (default: CPU count)
Files per batch (default: 256)

Comparison

Minimum similarity (0-1, default: 0.95)
Minimum aspect ratio similarity (default: 0.85)

Benchmarks

Run comparison method benchmarks (off by default)
Desired FPR for threshold optimization (default: 0.00075)