Metadata-Version: 2.4
Name: pdtrain
Version: 0.1.0
Summary: Pipedream Training Orchestrator CLI - Train ML models on AWS SageMaker
Author-email: Pipedream <support@pipedream.ai>
License: MIT
Project-URL: Homepage, https://pipedream.in
Project-URL: Documentation, https://pipedream.in
Project-URL: Repository, https://github.com/pipedream/pdtrain
Project-URL: Issues, https://github.com/pipedream/pdtrain/issues
Project-URL: API Keys, https://pipedream.in/api-keys
Keywords: ml,training,sagemaker,pipedream,cli
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click>=8.0
Requires-Dist: requests>=2.28
Requires-Dist: rich>=13.0
Requires-Dist: pydantic>=2.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: python-dotenv>=1.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: black>=23.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0; extra == "dev"
Dynamic: license-file

# pdtrain

**Pipedream Training Orchestrator CLI** - Train ML models on AWS SageMaker with ease.

## Installation

```bash
pip install pdtrain
```

## Quick Start

### 1. Configure

```bash
pdtrain configure
```

This will prompt you for:
- API URL (default: `http://localhost:8000`)
- API Key (from Pipedream dashboard)

### 2. Upload Training Code

```bash
# Upload a directory (will auto-create tar.gz)
pdtrain bundle upload ./my-training-code --wait

# Or upload existing tar.gz file
pdtrain bundle upload ./my-training-code.tar.gz --wait
```

### 3. Upload Dataset

```bash
pdtrain dataset upload ./data.csv --name "train-data" --wait
```

### 4. Create and Run Training

```bash
# Get bundle and dataset IDs first
pdtrain bundle list
pdtrain dataset list

# Create run using IDs
pdtrain run create \
  --bundle ca8912d6-79a4-4ea9-8570-234ec1baeef1 \
  --dataset ds_abc123-def456-7890 \
  --framework pytorch \
  --entry train.py \
  --submit --wait
```

### 5. View Results

```bash
# View logs
pdtrain logs run-abc123

# List artifacts
pdtrain artifacts list run-abc123

# Download artifacts
pdtrain artifacts download run-abc123 --output ./results/
```

## Commands

### Bundle Management

```bash
# Upload directory (auto-creates tar.gz)
pdtrain bundle upload ./my-training-code --wait

# Upload existing tar.gz
pdtrain bundle upload ./code.tar.gz --name "my-model" --wait

# List bundles
pdtrain bundle list

# Show bundle details
pdtrain bundle show abc-123
```

### Dataset Management

```bash
# Upload dataset
pdtrain dataset upload ./data.csv --name "train-data" --wait

# List datasets
pdtrain dataset list

# Download dataset
pdtrain dataset download ds-456 --version 1 --output ./data/
```

### Training Runs

```bash
# Create run
pdtrain run create \
  --bundle my-model:v1.0.0 \
  --dataset train-data:1 \
  --framework pytorch \
  --entry train.py

# Submit run
pdtrain run submit run-abc123

# List runs
pdtrain run list

# Show run details
pdtrain run show run-abc123

# Watch run progress
pdtrain run watch run-abc123

# Stop run
pdtrain run stop run-abc123
```

### Logs & Artifacts

```bash
# View logs (last 300 lines by default)
pdtrain logs run-abc123 --lines 500

# Follow logs in real-time
pdtrain logs run-abc123 --follow --interval 5

# List artifacts
pdtrain artifacts list run-abc123

# Download artifacts (defaults to ./artifacts/<run_id>/)
pdtrain artifacts download run-abc123 --output ./results/
```

### Quota

```bash
# Check storage quota
pdtrain quota
```

## Configuration

Configuration is stored in `~/.pdtrain/config.json`.

You can also use environment variables:
```bash
export PDTRAIN_API_URL=https://ml-orchestrator.pipedream.in
export PDTRAIN_API_KEY=sdk_xxxxx
```

## Examples

### Complete Workflow

```bash
# 1. Upload code (from directory)
pdtrain bundle upload ./resnet-training --wait

# 2. Upload dataset
pdtrain dataset upload ./cifar10.csv --name "cifar10" --wait

# 3. Get IDs
pdtrain bundle list    # Copy bundle ID
pdtrain dataset list   # Copy dataset ID

# 4. Run training (use IDs from step 3)
pdtrain run create \
  --bundle <bundle-id> \
  --dataset <dataset-id> \
  --framework pytorch \
  --entry train.py \
  --submit --wait

# 5. Download results
pdtrain artifacts download <run-id> --output ./results/
```

### Docker Mode

```bash
pdtrain run create \
  --bundle my-code:latest \
  --image 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.2.0-cpu \
  --entry train.py \
  --submit
```

### Script Mode with Hyperparameters

```bash
pdtrain run create \
  --bundle my-code:latest \
  --framework pytorch \
  --framework-version 2.2.0 \
  --hyperparameter epochs=10 \
  --hyperparameter batch_size=32 \
  --submit
```

## Development

```bash
# Clone repository
git clone https://github.com/pipedream/pdtrain
cd pdtrain

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black pdtrain/
```

## License

MIT License - see LICENSE file for details.

## Support

- API KEYS:https://pipedream.in/api-keys
- Issues: https://github.com/pipedream/pdtrain/issues
- Email: hello@pipedream.in
