Metadata-Version: 2.4
Name: hla-compass
Version: 1.6.8
Summary: Python SDK for HLA-Compass bioinformatics platform - Build powerful modules for immuno-peptidomics analysis
Home-page: https://github.com/AlitheaBio/HLA-Compass-platform
Author: Alithea Bio
Author-email: armanas.povilionis@alithea.bio
Project-URL: Bug Tracker, https://github.com/AlitheaBio/HLA-Compass-platform/issues
Project-URL: Documentation, https://docs.alithea.bio
Project-URL: Source Code, https://github.com/AlitheaBio/HLA-Compass-platform/tree/main/sdk/python
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: requests>=2.28.0
Requires-Dist: boto3>=1.26.0
Requires-Dist: click>=8.0.0
Requires-Dist: rich>=12.0.0
Requires-Dist: PyYAML>=6.0
Requires-Dist: setuptools>=45.0.0
Requires-Dist: cryptography>=41.0.0
Requires-Dist: jsonschema>=4.17.0
Requires-Dist: questionary>=2.0.0
Requires-Dist: jinja2>=3.1.0
Requires-Dist: watchdog>=3.0.0
Requires-Dist: aiohttp>=3.9.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=3.0.0; extra == "dev"
Requires-Dist: pytest-mock>=3.6.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: isort>=5.10.0; extra == "dev"
Requires-Dist: flake8>=4.0.0; extra == "dev"
Requires-Dist: mypy>=0.950; extra == "dev"
Requires-Dist: sphinx>=4.0.0; extra == "dev"
Requires-Dist: sphinx-rtd-theme>=1.0.0; extra == "dev"
Provides-Extra: data
Requires-Dist: pandas>=2.2.0; python_version >= "3.9" and extra == "data"
Requires-Dist: pandas<2.0,>=1.5; python_version < "3.9" and extra == "data"
Requires-Dist: xlsxwriter>=3.0.0; extra == "data"
Provides-Extra: ml
Requires-Dist: scikit-learn>=1.0.0; extra == "ml"
Requires-Dist: torch>=1.10.0; extra == "ml"
Requires-Dist: transformers>=4.20.0; extra == "ml"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: project-url
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# HLA-Compass Python SDK

[![PyPI version](https://badge.fury.io/py/hla-compass.svg)](https://badge.fury.io/py/hla-compass)
[![Python Versions](https://img.shields.io/pypi/pyversions/hla-compass.svg)](https://pypi.org/project/hla-compass/)
[![Downloads](https://pepy.tech/badge/hla-compass)](https://pepy.tech/project/hla-compass)

The official Python SDK for developing modules on the HLA-Compass platform.

## Requirements

- Python 3.8 or higher (3.8, 3.9, 3.10, 3.11, 3.12, 3.13 supported)
- pip package manager

## Installation

```bash
pip install hla-compass
```

That's it! No extras needed.

### Install from Source (Latest Development Version)

```bash
# Clone the repository
git clone https://github.com/AlitheaBio/HLA-Compass-platform.git
cd HLA-Compass-platform/sdk/python

# Install in development mode
pip install -e .
```

## Quick Start

```bash
# 1. Setup
hla-compass configure --env dev
hla-compass auth login --env dev

# 2. Create module
hla-compass init my-module --interactive
cd my-module

# 3. Develop with hot-reload
hla-compass dev

# 4. Test
hla-compass test --input examples/sample.json

# 5. Publish
hla-compass build --tag my-module:1.0.0
hla-compass publish --env dev --image-ref my-module:1.0.0
```

## Workflow

The SDK follows a simple 5-step workflow:

1. **init** - Scaffold module from template
2. **dev** - Develop with hot-reload in Docker
3. **validate/test** - Validate manifest and test execution
4. **build** - Create Docker image
5. **publish** - Sign and register with platform

Each step builds on the previous, ensuring a smooth development experience.

📚 **For complete development guide, see: [/docs/MODULES_DEVELOPER_HANDBOOK.md](https://github.com/AlitheaBio/HLA-Compass-platform/blob/main/docs/MODULES_DEVELOPER_HANDBOOK.md)**

## Ten-Minute Module (Guided)

| Step | Command / Code | Why |
| --- | --- | --- |
| 1 | `pip install hla-compass` | Installs CLI, wizard, dev server, and data helpers in one go so nothing is missing when you scaffold. |
| 2 | `hla-compass init --interactive` | Wizard captures manifest metadata, inputs, and UI choice, emitting backend/frontend scaffolding ready to run. |
| 3 | `hla-compass validate` | JSON Schema validation with “Did you mean …?” hints spots typos before runtime. |
| 4 | ```python\nfrom hla_compass.testing import ModuleTester\nfrom backend.main import Module\nModuleTester().quickstart(Module, mode="interactive")\n``` | Spins up a mock context using manifest defaults so you can smoke-test the execute path instantly. |
| 5 | Add `self.storage.save_json("results/preview.json", payload)` inside `execute` | Storage helper normalises metadata, accepts streams, and raises friendly extras guidance if optional deps (pandas/xlsxwriter) are missing. |
| 6 | ```python\nfrom hla_compass.client import APIClient\nclient = APIClient()\nclient.set_telemetry_callback(print)\n``` | Hooks the new telemetry callback so each API request logs status/duration/idempotency for troubleshooting. |
| 7 | `hla-compass doctor` | Checks auth, rate-limit env vars, Docker, Node, optional extras, and tells you exactly what to install or export. |
| 8 | `hla-compass build && hla-compass publish --env dev` | Generates MCP descriptor + Docker image, signs manifest, pushes, and registers the module with the chosen environment. |

👉 The table above mirrors the expanded walkthrough in `SDK.md`, which adds manifest snippets and deeper commentary.

## CLI Commands

### Module Management

```bash
# Create new module
hla-compass init <name> [options]
# Options: --yes (skip prompts), --template, --type

# Validate module structure
hla-compass validate [--json]  # --json for machine-readable output

# Test module (auto-detects local vs remote based on auth)
hla-compass test [--input FILE] [--verbose]

# Test locally without API
hla-compass test --local [--input FILE]

# Test against real API (requires auth)
# Note: Remote mode currently executes locally with API authentication context
# Full remote execution on the platform is on the roadmap
hla-compass test --remote [--input FILE]

# Output as JSON for CI/automation
hla-compass test --json [--input FILE]

# Build container image (local by default)
hla-compass build [--tag TAG] [--push]

# Publish container + MCP descriptor to the platform (signs automatically)
hla-compass publish --env ENV --image-ref IMAGE

# List deployed modules
hla-compass list [--env ENV]

# Removed commands (functionality now integrated into publish):
# - hla-compass sign: Signing now happens automatically during publish
# - hla-compass deploy: Use publish instead
```

### Development Server

```bash
# Offline dev server (quiet by default)
hla-compass dev

# Stream webpack/aiohttp logs
hla-compass dev --verbose

# Proxy selected routes to the live API (read-only by default)
hla-compass dev --online --env dev --proxy-routes=auth,data

# Serve UI bundle from a custom webpack port
hla-compass dev --frontend-port 3100
```

### Authentication

```bash
# Login to platform
hla-compass auth login --env dev

# Logout
hla-compass auth logout

# Register as a developer (self-service)
hla-compass auth register [--env ENV]
```

## CLI Feature Matrix

| Command | Description | Status |
|---|---|---|
| `hla-compass configure` | Initialize SDK config and signing keys | Available |
| `hla-compass init` | Scaffold a new module from templates | Available |
| `hla-compass validate` | Validate module structure and manifest | Available |
| `hla-compass test --local` | Run module locally without API | Available |
| `hla-compass test --remote` | Run locally with API auth context | Available (executes locally) |
| `hla-compass auth login/logout/status` | Authentication flows | Available |
| `hla-compass build` | Build Docker image + MCP descriptor (optionally push) | Available |
| `hla-compass publish` | Sign and register module with platform | Available |
| `hla-compass list` | List deployed modules | Available |
| `hla-compass test` | Test module execution (local/remote) | Available |
| `hla-compass sign` | Sign a module directory/manifest | Removed (use publish) |
| `hla-compass deploy` | Register an existing container image | Removed (use publish) |
| `hla-compass templates list` | List module templates | Planned |
| `hla-compass local-services` | Start local DB/S3 dev services | Planned |

## Storage Quick Reference

| Helper | Input Types | Output | Notes |
| --- | --- | --- | --- |
| `save_json(name, data)` | `dict`, `list`, or serialisable objects | `.json` artifact | Uses `json.dumps(..., default=str)` so datetimes survive. [storage.py:120–196] |
| `save_csv(name, dataframe, index=False, compress=False)` | pandas `DataFrame` | `.csv` text or `.csv.gz` when `compress=True` | Requires `pandas`; metadata coerced to strings automatically; safe against CSV formula injection. |
| `save_excel(name, dataframe)` | pandas `DataFrame` or dict of DataFrames | `.xlsx` workbook | Raises `MissingDependencyError` with install guidance when optional extras are missing. [storage.py:224–276] |
| `save_file(name, content, content_type=None, metadata=None)` | `bytes`, `str`, file-like object | Raw object with MIME detection | Accepts streams, normalises metadata collections to comma-separated strings. [storage.py:79–158] |

All helpers return the storage key/URL emitted by the execution environment, so you can echo links in `summary` payloads without additional plumbing.

Notes:
- `--remote` test currently executes locally with API authentication context; full remote execution is on the roadmap.
- For end-to-end deployments, prefer the `publish` path integrated with the platform pipelines.

## Testing

### Unit Testing

```python
# tests/test_module.py
import pytest
from hla_compass.testing import ModuleTester, MockContext

def test_module_execution():
    tester = ModuleTester()
    
    input_data = {
        'sequence': 'MLLSVPLLL',
        'threshold': 0.5
    }
    
    result = tester.test_local(
        'backend/main.py',
        input_data
    )
    
    assert result['status'] == 'success'
    assert len(result['results']) > 0
```

### Integration Testing

```python
# Test with mock API data
def test_with_mock_data():
    context = MockContext.create(
        api_data={
            'peptides': [
                {'id': '1', 'sequence': 'MLLSVPLLL'},
                {'id': '2', 'sequence': 'SIINFEKL'}
            ]
        }
    )
    
    result = tester.test_local(
        'backend/main.py',
        {'min_length': 8},
        context
    )
    
    assert len(result['results']) == 2
```

Need to exercise the devkit container or real S3/MinIO buckets? Opt in explicitly so tests don’t accidentally touch production services:

```python
tester.configure_local_devkit(use_real_storage=True)
context = MockContext.create(use_devkit=True, use_real_storage=True)
```

### Performance Testing

```python
# Benchmark module performance
def test_performance():
    tester = ModuleTester()
    
    results = tester.benchmark(
        'backend/main.py',
        input_data,
        iterations=100
    )
    
    assert results['average_time'] < 0.1  # 100ms
```

## Advanced Features

### Module Types

#### Lambda Modules (Quick Analysis)
- Execution time: <15 minutes
- Memory: 128MB - 10GB
- Best for: Simple calculations, data filtering

#### Fargate Modules (Long Running)
- Execution time: <8 hours
- Memory: 512MB - 30GB
- Best for: Complex pipelines, batch processing

#### SageMaker Modules (ML Inference)
- GPU support available
- Pre-trained model hosting
- Best for: Machine learning predictions

### UI Module Bundles

UI-capable modules ship a React bundle the platform loads dynamically. Keep the generated webpack configuration’s `ModuleUI` UMD export intact so the host application can mount your UI. When running `hla-compass dev`, pass `--verbose` to stream webpack output if the UI fails to load (helpful for diagnosing “ModuleUI UMD not found” and other build issues).

### Custom Validation

```python
class MyModule(Module):
    def validate_inputs(self, input_data):
        # Call parent validation
        validated = super().validate_inputs(input_data)
        
        # Custom validation
        sequence = validated.get('sequence', '')
        if not all(aa in 'ACDEFGHIKLMNPQRSTVWY' for aa in sequence):
            raise ValidationError("Invalid amino acids in sequence")
        
        return validated
```

## Best Practices

1. **Input Validation**: Always validate inputs thoroughly
2. **Error Handling**: Use try-except blocks and provide clear error messages
3. **Logging**: Use appropriate logging levels (debug, info, warning, error)
4. **Performance**: Process data in batches when possible
5. **Memory**: Stream large results instead of loading all into memory
6. **Security**: Never hardcode credentials or sensitive data
7. **Testing**: Write comprehensive tests for all functionality
8. **Documentation**: Document inputs, outputs, and algorithms clearly

## Environment Variables

- `HLA_COMPASS_ENV`: Default environment (dev, staging, prod) - takes precedence over HLA_ENV
- `HLA_ENV`: Alternative environment variable (used if HLA_COMPASS_ENV not set)
- `HLA_COMPASS_CONFIG_DIR`: Configuration directory (default: ~/.hla-compass)
- `HLA_COMPASS_LOG_LEVEL`: Logging level (DEBUG, INFO, WARNING, ERROR)
- `HLA_CORRELATION_ID`: Optional global correlation ID propagated as `X-Correlation-Id`

## Troubleshooting

### Common Issues

1. **Import Errors**: Ensure all dependencies are in requirements.txt
2. **Timeout Errors**: Increase timeout in manifest.json or optimize code
3. **Memory Errors**: Process data in smaller chunks or increase memory
4. **Authentication Errors**: Run `hla-compass auth login` to refresh tokens

### Debug Mode

Enable debug logging:
```python
import logging
logging.basicConfig(level=logging.DEBUG)
```

## Request Headers & Tracing

The SDK standardizes outbound request headers and retry behavior.

- Accept: Always `application/json` (session default)
- User-Agent: `hla-compass-sdk/<version> python/<version> os/<platform>`
- Content-Type: `application/json` for JSON requests (per-call)
- X-Request-Id: Unique UUID per HTTP attempt (changes on retry)
- X-Correlation-Id: Optional global correlation ID for cross-service tracing
  - Set environment variable `HLA_CORRELATION_ID` to propagate a stable value
  - Added to both SDK auth requests and APIClient requests
- Idempotency-Key: Added to POST requests to make retries safe
  - A stable UUID is generated per SDK call and reused across retry attempts
  - Server-side should de-duplicate requests using this key

### Retry Policy

- GET/HEAD/OPTIONS: Retries enabled (connection/read) with exponential backoff via `urllib3.Retry`
- 401 Unauthorized: One automatic token refresh (`Auth.refresh_token`) and retry
- 429 Too Many Requests: Honors `Retry-After` (falls back to exponential backoff)
- 5xx Server Errors: Exponential backoff; for POST, retries only when an `Idempotency-Key` is present (the SDK adds one per call)
- Timeouts/Network errors: Limited retries with backoff for transient connection issues

Tip: For end-to-end traceability, set `HLA_CORRELATION_ID` in your environment and pass it across your systems.

## Deployment Note: SimpleDeployer (Deprecated / Dev-only)

The legacy `hla_compass.deployer.SimpleDeployer` utility is disabled by default and intended only for local development experiments. It bypasses platform authentication and infrastructure pipelines.

- Disabled by default; to enable explicitly set `HLA_ENABLE_SIMPLE_DEPLOYER=1`.
- Recommended path: use `hla-compass build` + `hla-compass publish/deploy` and the platform’s Serverless/Terraform pipelines.

## Support

- **Complete Guide**: [/docs/MODULES_DEVELOPER_HANDBOOK.md](https://github.com/AlitheaBio/HLA-Compass-platform/blob/main/docs/MODULES_DEVELOPER_HANDBOOK.md)
- **Examples**: [/modules/examples/](https://github.com/AlitheaBio/HLA-Compass-platform/tree/main/modules/examples)
- **Issues**: [GitHub Issues](https://github.com/AlitheaBio/HLA-Compass-platform/issues)
- **Quick Start**: [/docs/MODULES_DEVELOPER_HANDBOOK.md#2-quick-start-5-minutes](https://github.com/AlitheaBio/HLA-Compass-platform/blob/main/docs/MODULES_DEVELOPER_HANDBOOK.md#2-quick-start-5-minutes)

## License

Copyright © 2024 Alithea Bio. All rights reserved.


## Advanced Topics

### Handling Large Datasets (Streaming Iterators)
When working with large result sets, avoid loading everything into memory. Use the APIClient streaming iterators which handle pagination for you and respect the configured rate limits.

Example: Iterating peptides in pages of 1,000 and stopping after 100k results
```python
from hla_compass.client import APIClient

client = APIClient()

count = 0
for peptide in client.iter_peptides(
    {"hla_allele": "HLA-A*02:01"},
    page_size=1000,
    max_results=100_000,
):
    # process peptide incrementally
    count += 1

print("processed", count)
```

Inside a Module, you can access the underlying API client from the data helper to stream safely during execute:
```python
from hla_compass.module import Module

class StreamModule(Module):
    def execute(self, input_data, context):
        assert self.peptides is not None and self.peptides.api is not None

        top = []
        for pep in self.peptides.api.iter_peptides({"sequence": input_data.get("sequence", "%")}, page_size=2000):
            # Example: collect just a small sample to keep memory bounded
            if len(top) < 5000:
                top.append({"sequence": pep["sequence"], "mass": pep.get("mass")})
            else:
                break
        return top
```

Tips:
- Prefer iter_peptides/iter_proteins/iter_samples for bulk processing.
- Use max_results to enforce hard caps in long-running jobs.
- For DataFrame needs, accumulate to CSV via Storage.save_csv in chunks instead of building a giant list in memory.

### Telemetry Callbacks for APIClient
Capture per-request metrics (status, duration, retries, idempotency key presence) with a callback to ease troubleshooting.

```python
import logging
from hla_compass.client import APIClient

log = logging.getLogger(__name__)

def on_api(data):
    # data contains: method, url, status_code, duration_seconds, attempt, max_retries,
    #                rate_limit_wait, params (redacted in logs), idempotency_key, request_id, error
    log.info("API %s %s %.3fs", data["status_code"], data["url"], data["duration_seconds"]) 

client = APIClient()
client.set_telemetry_callback(on_api)
client.get_peptide("pep_123")
print(client.get_rate_limit_state())  # {'max_requests': ..., 'time_window': ..., 'last_wait_seconds': ...}
```

Environment knobs:
- HLA_RATE_LIMIT_MAX_REQUESTS, HLA_RATE_LIMIT_TIME_WINDOW to tune rate limits
- HLA_CORRELATION_ID to propagate a stable X-Correlation-Id across requests

### Auto-generated API Reference (pdoc)
- Browse: docs/sdk/python/index.html (generated)
- Build locally:

```bash
bash scripts/docs-generate-python-sdk.sh
open docs/sdk/python/index.html  # macOS, or use your browser to open the file
```

The script sets PYTHONPATH to include sdk/python and generates docs for the hla_compass package using pdoc.
