Metadata-Version: 2.4
Name: psamfinder
Version: 0.3.2
Summary: Command-line tool to find and optionally delete duplicate files by content (SHA-256)
Project-URL: Homepage, https://github.com/psam-717/psamfinder
Project-URL: Repository, https://github.com/psam-717/psamfinder
Author-email: "Marvinphil Annorbah (psam)" <mphilannorbah@gmail.com>
License: MIT License
        
        Copyright (c) 2026 psam
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: cli,disk-cleanup,duplicate-files,file-duplicates,sha256
Classifier: Environment :: Console
Classifier: Intended Audience :: End Users/Desktop
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Utilities
Requires-Python: >=3.8
Requires-Dist: typer==0.24.0
Description-Content-Type: text/markdown

# psamfinder — File duplicate finder

[![PyPI](https://img.shields.io/pypi/v/psamfinder)](https://pypi.org/project/psamfinder/)
[![Python](https://img.shields.io/pypi/pyversions/psamfinder)](https://pypi.org/project/psamfinder/)

psamfinder is a lightweight CLI tool that recursively scans directories for files with identical content (using SHA-256 hashing) and helps you manage duplicates interactively.

## Requirements
- Python 3.8+
- hatchling (for building, referenced in pyproject.toml)

## Installation
**From PyPI (recommended):**
```bash
pip install psamfinder
# or for isolated CLI install (recommended)
pipx install psamfinder


# For development/ from source
git clone https://github.com/psam-717/psamfinder.git
cd psamfinder
pip install -e .


## Running
- As a CLI (installed entry point):
  psamfinder scan <DIRECTORY> [--delete] [--dry-run] [-q]

- From source:
  python -m psamfinder scan <DIRECTORY> [--delete] [--dry-run] [-q]

Examples:
- Scan a directory and list duplicates:
  psamfinder scan C:\path\to\dir

- Scan and interactively delete duplicates (asks which file to keep per group):
  psamfinder scan C:\path\to\dir --delete

- Preview deletion without actually removing files (dry-run mode)
  psamfinder scan C:\path\to\dir --delete --dry-run
  # Shows "Would have deleted: ..." for each file that would be removed

- Show installed version:
  psamfinder --version

- Quiet scan (suppresses the scanning line):
  psamfinder scan C:\path\to\dir -q

## What the code does (line-level summary)

Files of interest:
- pyproject.toml
  - Project metadata: name `psamfinder`, version `0.3.2`, description "Command-line tool to find and optionally delete duplicate files by content (SHA-256)".
  - Entry point: `psamfinder = "psamfinder.cli:app"` (Typer app). The CLI supports a `--version` / `-V` option that prints the installed psamfinder version and exits.
  - Build system: hatchling.

- psamfinder/__main__.py
  - Imports `app` from psamfinder.cli and calls `sys.exit(app())` so `python -m psamfinder` runs the CLI.

- psamfinder/cli.py
  - Uses Typer to create a CLI app named `psamfinder` with help text.
  - Exposes a `scan` command that accepts:
    - directory: pathlib.Path (must exist, resolved, must be a directory)
    - --delete / -d: boolean option to enable interactive deletion after listing
    - --quiet / -q: boolean option to suppress the scanning message
  - Behavior:
    - Prints "Scanning: <directory> ..." unless -q is used (lines 51-52).
    - Calls `find_duplicates(str(directory))` from psamfinder.finder (line 54).
    - If no duplicates are found, prints "No duplicates found" and exits with code 0 (lines 56-58).
    - Otherwise calls `print_duplicates(duplicates)`, and if `--delete` was passed, asks for confirmation and calls `delete_duplicates(duplicates)`.

- psamfinder/finder.py
  - compute_hash(filepath)
    - Computes SHA-256 digest of file content in 4 KiB chunks (line 12 uses 4096 bytes).
    - Returns the hex digest string, or `None` if a PermissionError or FileNotFoundError occurs (lines 15-17). Errors are printed to stderr.
  - find_duplicates(directory)
    - Walks the directory recursively with os.walk (line 25).
    - For every file, builds its absolute path and computes its SHA-256 hash using `compute_hash`.
    - Collects paths in a dict mapping hash -> list of file paths (lines 24, 30-32).
    - Returns a dictionary of only the hashes that have 2 or more files (duplicates) (line 34).
    - Return type: dict[str, list[str]] where keys are hex SHA-256 strings and values are lists of file paths.
  - print_duplicates(duplicates)
    - Nicely prints a header and then each duplicate group showing the shared hash and the file paths (lines 42-47).
    - If duplicates is empty or falsy, prints "No duplicates found" and returns (lines 39-41).
  - delete_duplicates(duplicates)
    - For each duplicate group, lists the files with indices and prompts the user to enter the number of the file to keep (line 55), or type `skip` to keep all.
    - If a valid index is provided, removes all other files in that group using `os.remove` and prints the path deleted (lines 61-64).
    - If input is invalid (non-integer or out of bounds) it prints a message and skips deletion for that group (lines 65-68).

## Notes, gotchas, and suggestions
- compute_hash uses a 4 KiB read buffer; this is a reasonable trade-off between memory usage and speed. 
- Files that cannot be read due to permissions or that disappear during scanning are skipped and reported to stderr by compute_hash (the hash function returns None on these errors).
- Deletion is interactive and destructive — always use --dry-run first when testing
- Use backups or version control before running with --delete without --dry-run
- Duplicates are detected strictly by content hash — identical content but different metadata is still considered duplicate
- --dry-run only affects deletion; scanning and listing always happen normally
- The CLI `--delete` option first requests a confirmation prompt; deletion then asks which single file to keep in each group.
- The tool identifies duplicates strictly by file content hash. Files with identical content but different metadata (timestamps, permissions, names) are considered duplicates.

## Packaging
- The project is configured with pyproject.toml; the package includes the `psamfinder` module and exposes a console script in [project.scripts]. Use `python -m build` or `hatch build` in a properly configured environment to build a wheel.

## Extending / Contributing
- Possible improvements:
  - Add unit tests for compute_hash and find_duplicates.
  - Add options to automatically pick which files to keep (e.g., keep newest, keep largest, keep by pattern) for non-interactive deletion.
  - Add progress reporting for large scans, or parallel hashing for performance.
- Contributions via pull requests are welcome. Add tests and update the README with usage examples for new features.

## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Contact
Author:
- Marvinphil Annorbah(psam) (GitHub: [@psam-717](https://github.com/psam-717))

