Metadata-Version: 2.3
Name: metadata-cleaner
Version: 2.0.3
Summary: A CLI tool to remove or selectively filter metadata from images, documents, audio, and video files.
License: MIT
Author: Sandeep Paidipati
Author-email: sandeep.paidipati@gmail.com
Requires-Python: >=3.7
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: click (>=8.1.3,<9.0.0)
Requires-Dist: mutagen (>=1.45.1,<2.0.0)
Requires-Dist: piexif (>=1.1.3,<2.0.0)
Requires-Dist: pillow (>=9.5.0,<10.0.0)
Requires-Dist: pymediainfo (>=5.0.3,<6.0.0)
Requires-Dist: pypdf (>=3.12.0,<4.0.0)
Requires-Dist: python-docx (>=0.8.11,<0.9.0)
Requires-Dist: tqdm (>=4.64.0,<5.0.0)
Project-URL: Homepage, https://github.com/sandy-sp/metadata-cleaner
Project-URL: Repository, https://github.com/sandy-sp/metadata-cleaner
Description-Content-Type: text/markdown

# 📄 Metadata Cleaner 🔍
[![Build Status](https://github.com/sandy-sp/metadata-cleaner/actions/workflows/release-and-publish.yml/badge.svg)](https://github.com/sandy-sp/metadata-cleaner/actions)
[![Release](https://img.shields.io/github/release/sandy-sp/metadata-cleaner.svg)](https://github.com/sandy-sp/metadata-cleaner/releases)
[![PyPI version](https://badge.fury.io/py/metadata-cleaner.svg)](https://pypi.org/project/metadata-cleaner/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

*A powerful CLI tool to remove or selectively filter metadata from images, PDFs, DOCX, audio, and video files.*

---

## 📌 Overview

**Metadata Cleaner** is a fast and efficient **command-line tool** designed for privacy protection, security compliance, and data sanitization. It supports removing metadata from various file formats including images, documents, audio, and video files, with options for selective filtering and parallel batch processing.

🔍 **Why use Metadata Cleaner?**
- **Protect your privacy:** Strip hidden metadata from files.
- **Sanitize sensitive documents:** Prepare files for sharing without revealing personal information.
- **Reduce file size:** Remove unnecessary metadata.
- **Batch process:** Clean metadata from individual files or entire folders (with recursive support).

---

## 🚀 Features

- **Selective Metadata Filtering:**  
  Configure which metadata fields to preserve or remove using a JSON configuration file.

- **Batch & Recursive Processing:**  
  Process a single file, an entire folder, or even subfolders recursively.

- **Parallel Processing:**  
  Accelerate batch operations using multi-file parallel execution.

- **Cross-Platform CLI:**  
  Works on Linux, macOS, and Windows.

- **Logging & Error Reporting:**  
  Detailed logs help troubleshoot issues easily.

---

## 🛠️ Installation & Usage

### **1️⃣ Using Poetry (Recommended)**

If you use [Poetry](https://python-poetry.org/), simply clone the repository and install dependencies:

```bash
git clone https://github.com/sandy-sp/metadata-cleaner.git
cd metadata-cleaner
poetry install
```

To run Metadata Cleaner:
```bash
poetry run metadata-cleaner --help
```

### **2️⃣ Install via PyPI**

Once published to PyPI, you can install it with pip:
```bash
pip install metadata-cleaner
```

And run it:
```bash
metadata-cleaner --help
```

### **3️⃣ Usage Examples**

#### **Remove Metadata from a Single File**
```bash
metadata-cleaner --file path/to/file.jpg
```
**Example Output:**
```
Do you want to process file.jpg? [Y/n]: Y
✅ Metadata removed. Cleaned file saved at: path/to/file_cleaned.jpg
```

#### **Remove Metadata from All Files in a Folder (Non-Recursive)**
```bash
metadata-cleaner --folder test_folder
```
**Example Output:**
```
Do you want to process all files in test_folder? [Y/n]: Y
Processing Files: 100% |████████████████| 5/5 [00:10s]

📊 Summary Report:
✅ Successfully processed: 5 files
Cleaned files saved in: test_folder/cleaned
```

#### **Batch Processing with Recursive Search & Custom Output**
```bash
metadata-cleaner --folder my_folder --recursive --output sanitized_files --yes
```
**Example Output:**
```
Processing Files: 100% |████████████████| 20/20 [00:15s]

📊 Summary Report:
✅ Successfully processed: 20 files
Cleaned files saved in: sanitized_files
```

#### **Using a Custom Configuration File**
You can create a JSON configuration file (e.g., `config.json`) to specify selective metadata rules. Then run:
```bash
metadata-cleaner --file sample.jpg --config config.json
```

---

## 🔧 How It Works

1. **File Detection:**  
   The tool detects the file type and selects the appropriate handler.

2. **Selective Filtering:**  
   For image files, it uses a configuration file (if provided) to selectively remove or preserve EXIF metadata.

3. **Processing:**  
   Files are processed—either individually or in batches—with parallel execution for efficiency.

4. **Output & Logging:**  
   Cleaned files are saved in a default or specified output folder, and detailed logs are generated for troubleshooting.

---

## 💻 Project Structure

```
metadata-cleaner/
├── docs/                     # Documentation
├── metadata_cleaner/         # Python package source code
│   ├── cli.py                # CLI entry point
│   ├── remover.py            # Core metadata removal logic
│   ├── config/               # Configuration settings
│   ├── core/                 # Metadata filtering utilities
│   ├── file_handlers/        # File-specific metadata handlers
│   └── logs/                 # Logging configuration
├── tests/                    # Unit tests
├── scripts/                  # Setup and environment scripts (Poetry-based)
├── pyproject.toml            # Poetry configuration file
├── MANIFEST.in               # Manifest file for packaging
└── README.md                 # This file
```

---

## 💡 Contributing

Contributions are welcome! To contribute:

1. **Fork the repository**
2. **Create a new branch** for your feature:
   ```bash
   git checkout -b feature-name
   ```
3. **Make your changes and test** using:
   ```bash
   poetry run pytest
   ```
4. **Commit and push** your changes:
   ```bash
   git commit -m "Describe your feature"
   git push origin feature-name
   ```
5. **Submit a Pull Request**

---

## 🔗 Resources & Links

- **API Reference:** [docs/API_REFERENCE.md](docs/API_REFERENCE.md)
- **Usage Guide:** [docs/USAGE.md](docs/USAGE.md)
- **Planned Features:** [docs/PLANNED_FEATURES.md](docs/PLANNED_FEATURES.md)
- **GitHub Repository:** [metadata-cleaner](https://github.com/sandy-sp/metadata-cleaner)
- **PyPI Package:** [metadata-cleaner](https://pypi.org/project/metadata-cleaner/)

---

## ❤️ Support

If you find this tool useful, please give it a ⭐ on GitHub!  
For issues or questions, [open an issue](https://github.com/sandy-sp/metadata-cleaner/issues) or contact `sandeep.paidipati@gmail.com`.

---

## 🔒 License

This project is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for details.

