Metadata-Version: 2.4
Name: wedgebox-hpc
Version: 1.0.0
Summary: GPU Performance Monitor for HPC Clusters
Home-page: https://github.com/wedgebox/wedgebox-hpc
Author: WedgeBox Team
Author-email: WedgeBox Team <wedgebox.team@gmail.com>
Project-URL: Homepage, https://github.com/wedgebox/wedgebox-hpc
Project-URL: Bug Tracker, https://github.com/wedgebox/wedgebox-hpc/issues
Project-URL: Documentation, https://github.com/wedgebox/wedgebox-hpc
Project-URL: Source Code, https://github.com/wedgebox/wedgebox-hpc
Keywords: gpu,monitoring,hpc,machine-learning,performance,nvidia,amd
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: System :: Monitoring
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: POSIX :: Linux
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: psutil>=5.8.0
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# WedgeBox-HPC

Simple GPU Performance Monitoring for HPC Clusters

## Overview

WedgeBox-HPC provides detailed GPU telemetry for debugging and optimizing
machine learning workloads on HPC infrastructure.

**Features:**
- Multi-vendor GPU support (NVIDIA, AMD)
- Process-level GPU resource tracking
- Zero-configuration monitoring
- Human-readable console output + detailed JSON reports

## Quick Start
```bash
# Download
wget https://raw.githubusercontent.com/wedgebox/wedgebox-hpc/main/wedgebox-hpc
chmod +x wedgebox-hpc

# Run
./wedgebox-hpc
```

## Requirements

- Python 3.8+
- psutil (`pip install psutil --user`)
- GPU drivers (nvidia-smi for NVIDIA, rocm-smi for AMD)

## For HPC Administrators

See [docs/installation.md](docs/installation.md) for cluster-wide deployment instructions.

## Output

Generates:
- Console summary (human-readable)
- `wedgebox-report-[timestamp].json` (machine-readable)

## Use Cases

- Debug slow training jobs
- Monitor GPU utilization
- Profile memory usage
- Identify resource bottlenecks
- Track multi-GPU workload distribution

## License

MIT License - See LICENSE file
