Metadata-Version: 2.4
Name: hla-compass
Version: 2.0.3
Summary: Python SDK for HLA-Compass bioinformatics platform - Build powerful modules for immuno-peptidomics analysis
Author-email: Alithea Bio <armanas.povilionis@alithea.bio>
License: MIT
Project-URL: Bug Tracker, https://github.com/AlitheaBio/HLA-Compass-platform/issues
Project-URL: Documentation, https://docs.alithea.bio
Project-URL: Source Code, https://github.com/AlitheaBio/HLA-Compass-platform/tree/main/sdk/python
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: requests>=2.28.0
Requires-Dist: httpx>=0.24.0
Requires-Dist: boto3>=1.26.0
Requires-Dist: click>=8.0.0
Requires-Dist: rich>=12.0.0
Requires-Dist: PyYAML>=6.0
Requires-Dist: setuptools>=45.0.0
Requires-Dist: cryptography>=41.0.0
Requires-Dist: jsonschema>=4.17.0
Requires-Dist: questionary>=2.0.0
Requires-Dist: jinja2>=3.1.0
Requires-Dist: watchdog>=3.0.0
Requires-Dist: aiohttp>=3.9.0
Requires-Dist: pydantic>=2.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=3.0.0; extra == "dev"
Requires-Dist: pytest-mock>=3.6.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=0.950; extra == "dev"
Requires-Dist: sphinx>=4.0.0; extra == "dev"
Requires-Dist: sphinx-rtd-theme>=1.0.0; extra == "dev"
Provides-Extra: data
Requires-Dist: pandas>=2.2.0; python_version >= "3.9" and extra == "data"
Requires-Dist: pandas<2.0,>=1.5; python_version < "3.9" and extra == "data"
Requires-Dist: xlsxwriter>=3.0.0; extra == "data"
Requires-Dist: polars>=0.20.0; python_version >= "3.9" and extra == "data"
Provides-Extra: ml
Requires-Dist: scikit-learn>=1.0.0; extra == "ml"
Requires-Dist: torch>=1.10.0; extra == "ml"
Requires-Dist: transformers>=4.20.0; extra == "ml"

# HLA-Compass Python SDK

[![PyPI version](https://badge.fury.io/py/hla-compass.svg)](https://badge.fury.io/py/hla-compass)
[![Python Versions](https://img.shields.io/pypi/pyversions/hla-compass.svg)](https://pypi.org/project/hla-compass/)
[![Downloads](https://pepy.tech/badge/hla-compass)](https://pepy.tech/project/hla-compass)

The official Python SDK for developing modules on the HLA-Compass platform.

## Requirements

- Python 3.8 or higher (3.8, 3.9, 3.10, 3.11, 3.12, 3.13 supported)
- pip package manager

## Installation

```bash
pip install hla-compass
```

That's it! No extras needed.

### Install from Source (Latest Development Version)

```bash
# Clone the repository
git clone https://github.com/AlitheaBio/HLA-Compass-platform.git
cd HLA-Compass-platform/sdk/python

# Install in development mode
pip install -e .
```

## Quick Start

```bash
# 1. Setup
hla-compass keys init        # generate signing keys once (or use publish --generate-keys)
hla-compass auth login --env dev

# 2. Create module
hla-compass init my-module --interactive
cd my-module

# 3. Develop with hot-reload
hla-compass dev

# 4. Test
hla-compass test --input examples/sample.json

# 5. Publish
hla-compass build --tag my-module:1.0.0
hla-compass publish --env dev --image-ref my-module:1.0.0 --generate-keys
```

## Workflow

The SDK follows a simple 5-step workflow:

1. **init** - Scaffold module from template
2. **dev** - Develop with hot-reload in Docker
3. **validate/test** - Validate manifest and test execution
4. **build** - Create Docker image
5. **publish** - Sign and register with platform

Each step builds on the previous, ensuring a smooth development experience.

📚 **For complete development guide, see: [Developer Guide](https://docs.alithea.bio/guides/module-developer)**

### Compute support
- **Supported today:** `computeType: docker` (container image built by `hla-compass build`, run on ECS/Fargate internally).
- **Not yet supported via CLI:** `lambda`, `batch`, `sagemaker` — manifests using these values will be rejected until the runtimes are implemented.

## Stability Features

### Circuit Breaker
The SDK includes a built-in circuit breaker to prevent cascading failures. It automatically opens after consecutive failures and resets after a cooldown period.
- **Threshold**: 5 failures (configurable via `HLA_CIRCUIT_BREAKER_THRESHOLD`)
- **Reset**: 60 seconds (configurable via `HLA_CIRCUIT_BREAKER_TIMEOUT`)

### Rate Limiting
Client-side rate limiting ensures your application respects platform limits.
- **Default**: 100 requests / 60 seconds
- **Config**: `HLA_RATE_LIMIT_MAX_REQUESTS` and `HLA_RATE_LIMIT_TIME_WINDOW`

### Idempotency
All POST requests automatically include a stable `Idempotency-Key` to safely retry operations without side effects.

### Dependency Safety
When building modules, the SDK enforces strict dependency pinning. All packages in `requirements.txt` must have exact versions specified (`package==1.0.0`).

## Documentation
Full documentation is available at [developer.alithea.bio](https://developer.alithea.bio).

## Ten-Minute Module (Guided)

| Step | Command / Code | Why |
| --- | --- | --- |
| 1 | `pip install hla-compass` | Installs CLI, wizard, dev server, and data helpers in one go so nothing is missing when you scaffold. |
| 2 | `hla-compass init --interactive` | Wizard captures manifest metadata, inputs, and UI choice, emitting backend/frontend scaffolding ready to run. |
| 3 | `hla-compass validate-module` | Runs schema + structure + security checks (and warns if devkit/OpenAPI are out-of-sync) before you spend time testing. |
| 4 | ```python\nfrom hla_compass.testing import ModuleTester\nfrom backend.main import Module\nModuleTester().quickstart(Module, mode="interactive")\n``` | Spins up a mock context using manifest defaults so you can smoke-test the execute path instantly. |
| 5 | Add `self.storage.save_json("results/preview.json", payload)` inside `execute` | Storage helper normalises metadata, accepts streams, and raises friendly extras guidance if optional deps (pandas/xlsxwriter) are missing. |
| 6 | ```python\nfrom hla_compass.client import APIClient\nclient = APIClient()\nclient.set_telemetry_callback(print)\n``` | Hooks the new telemetry callback so each API request logs status/duration/idempotency for troubleshooting. |
| 7 | `hla-compass doctor` | Checks auth, rate-limit env vars, Docker, Node, optional extras, and tells you exactly what to install or export. |
| 8 | `hla-compass build && hla-compass publish --env dev --generate-keys` | Generates MCP descriptor + Docker image, signs manifest (auto-generating keys if missing), pushes, and registers the module with the chosen environment. |

👉 The table above mirrors the expanded walkthrough in [SDK Documentation](https://docs.alithea.bio/sdk), which adds manifest snippets and deeper commentary.

## CLI Commands

### Module Management

```bash
# Create new module
hla-compass init <name> [options]
# Options: --yes (skip prompts), --template, --type

# Validate module structure
hla-compass validate [--json]  # --json for machine-readable output

# Full validation (schema + structure + security + OpenAPI)
hla-compass validate-module [--strict] [--with-devkit] [--format json]

# Test module (auto-detects local vs remote based on auth)
hla-compass test [--input FILE] [--verbose]

# Test locally without API
hla-compass test --local [--input FILE]

# Test against real API (requires auth)
# Note: Remote mode currently executes locally with API authentication context
# Full remote execution on the platform is on the roadmap
hla-compass test --remote [--input FILE]

# Output as JSON for CI/automation
hla-compass test --json [--input FILE]

# Build container image (local by default)
hla-compass build [--tag TAG] [--push]

# Publish container + MCP descriptor to the platform (signs automatically)
hla-compass publish --env ENV --image-ref IMAGE

# Local devkit docker-compose stack
hla-compass devkit up [--build]
hla-compass devkit status
hla-compass devkit logs [--service postgres] [--follow]
hla-compass devkit down

# List deployed modules
hla-compass list [--env ENV]

# Removed commands (functionality now integrated into publish):
# - hla-compass sign: Signing now happens automatically during publish
# - hla-compass deploy: Use publish instead
```

### Development Server

```bash
# Offline dev server (quiet by default)
hla-compass dev

# Stream webpack/aiohttp logs
hla-compass dev --verbose

# Proxy selected routes to the live API (read-only by default)
hla-compass dev --online --env dev --proxy-routes=auth,data

# Serve UI bundle from a custom webpack port
hla-compass dev --frontend-port 3100
```

### Authentication

```bash
# Login to platform
hla-compass auth login --env dev

# Logout
hla-compass auth logout

# Register as a developer (self-service)
hla-compass auth register [--env ENV]
```

## CLI Feature Matrix

| Command | Description | Status |
|---|---|---|
| `hla-compass configure` | Initialize SDK config and signing keys | Available |
| `hla-compass init` | Scaffold a new module from templates | Available |
| `hla-compass validate` | Validate module structure and manifest | Available |
| `hla-compass test --local` | Run module locally without API | Available |
| `hla-compass test --remote` | Run locally with API auth context | Available (executes locally) |
| `hla-compass auth login/logout/status` | Authentication flows | Available |
| `hla-compass build` | Build Docker image + MCP descriptor (optionally push) | Available |
| `hla-compass publish` | Sign and register module with platform | Available |
| `hla-compass list` | List deployed modules | Available |
| `hla-compass test` | Test module execution (local/remote) | Available |
| `hla-compass sign` | Sign a module directory/manifest | Removed (use publish) |
| `hla-compass deploy` | Register an existing container image | Removed (use publish) |
| `hla-compass templates list` | List module templates | Planned |
| `hla-compass local-services` | Start local DB/S3 dev services | Planned |

## Storage Quick Reference

| Helper | Input Types | Output | Notes |
| --- | --- | --- | --- |
| `save_json(name, data)` | `dict`, `list`, or serialisable objects | `.json` artifact | Uses `json.dumps(..., default=str)` so datetimes survive. [storage.py:120–196] |
| `save_csv(name, dataframe, index=False, compress=False)` | pandas `DataFrame` | `.csv` text or `.csv.gz` when `compress=True` | Requires `pandas`; metadata coerced to strings automatically; safe against CSV formula injection. |
| `save_excel(name, dataframe)` | pandas `DataFrame` or dict of DataFrames | `.xlsx` workbook | Raises `MissingDependencyError` with install guidance when optional extras are missing. [storage.py:224–276] |
| `save_file(name, content, content_type=None, metadata=None)` | `bytes`, `str`, file-like object | Raw object with MIME detection | Accepts streams, normalises metadata collections to comma-separated strings. [storage.py:79–158] |

All helpers return the storage key/URL emitted by the execution environment, so you can echo links in `summary` payloads without additional plumbing.

Notes:
- `--remote` test currently executes locally with API authentication context; full remote execution is on the roadmap.
- For end-to-end deployments, prefer the `publish` path integrated with the platform pipelines.

## Testing

### Unit Testing

```python
# tests/test_module.py
import pytest
from hla_compass.testing import ModuleTester, MockContext

def test_module_execution():
    tester = ModuleTester()
    
    input_data = {
        'sequence': 'MLLSVPLLL',
        'threshold': 0.5
    }
    
    result = tester.test_local(
        'backend/main.py',
        input_data
    )
    
    assert result['status'] == 'success'
    assert len(result['results']) > 0
```

### Integration Testing

```python
# Test with mock API data
def test_with_mock_data():
    context = MockContext.create(
        api_data={
            'peptides': [
                {'id': '1', 'sequence': 'MLLSVPLLL'},
                {'id': '2', 'sequence': 'SIINFEKL'}
            ]
        }
    )
    
    result = tester.test_local(
        'backend/main.py',
        {'min_length': 8},
        context
    )
    
    assert len(result['results']) == 2
```

Need to exercise the devkit container or real S3/MinIO buckets? Opt in explicitly so tests don’t accidentally touch production services:

```python
tester.configure_local_devkit(use_real_storage=True)
context = MockContext.create(use_devkit=True, use_real_storage=True)
```

### Performance Testing

```python
# Benchmark module performance
def test_performance():
    tester = ModuleTester()
    
    results = tester.benchmark(
        'backend/main.py',
        input_data,
        iterations=100
    )
    
    assert results['average_time'] < 0.1  # 100ms
```

## Advanced Features

### Module Types

#### Lambda Modules (Quick Analysis)
- Execution time: <15 minutes
- Memory: 128MB - 10GB
- Best for: Simple calculations, data filtering

#### Fargate Modules (Long Running)
- Execution time: <8 hours
- Memory: 512MB - 30GB
- Best for: Complex pipelines, batch processing

#### SageMaker Modules (ML Inference)
- GPU support available
- Pre-trained model hosting
- Best for: Machine learning predictions

### UI Module Bundles

UI-capable modules ship a React bundle the platform loads dynamically. Keep the generated webpack configuration’s `ModuleUI` UMD export intact so the host application can mount your UI. When running `hla-compass dev`, pass `--verbose` to stream webpack output if the UI fails to load (helpful for diagnosing “ModuleUI UMD not found” and other build issues).

### Custom Validation

```python
class MyModule(Module):
    def validate_inputs(self, input_data):
        # Call parent validation
        validated = super().validate_inputs(input_data)
        
        # Custom validation
        sequence = validated.get('sequence', '')
        if not all(aa in 'ACDEFGHIKLMNPQRSTVWY' for aa in sequence):
            raise ValidationError("Invalid amino acids in sequence")
        
        return validated
```

## Best Practices

1. **Input Validation**: Always validate inputs thoroughly
2. **Error Handling**: Use try-except blocks and provide clear error messages
3. **Logging**: Use appropriate logging levels (debug, info, warning, error)
4. **Performance**: Process data in batches when possible
5. **Memory**: Stream large results instead of loading all into memory
6. **Security**: Never hardcode credentials or sensitive data
7. **Testing**: Write comprehensive tests for all functionality
8. **Documentation**: Document inputs, outputs, and algorithms clearly

## Environment Variables

- `HLA_COMPASS_ENV`: Default environment (dev, staging, prod) - takes precedence over HLA_ENV
- `HLA_ENV`: Alternative environment variable (used if HLA_COMPASS_ENV not set)
- `HLA_COMPASS_CONFIG_DIR`: Configuration directory (default: ~/.hla-compass)
- `HLA_COMPASS_LOG_LEVEL`: Logging level (DEBUG, INFO, WARNING, ERROR)
- `HLA_CORRELATION_ID`: Optional global correlation ID propagated as `X-Correlation-Id`

## Troubleshooting

### Common Issues

1. **Import Errors**: Ensure all dependencies are in requirements.txt
2. **Timeout Errors**: Increase timeout in manifest.json or optimize code
3. **Memory Errors**: Process data in smaller chunks or increase memory
4. **Authentication Errors**: Run `hla-compass auth login` to refresh tokens

### Debug Mode

Enable debug logging:
```python
import logging
logging.basicConfig(level=logging.DEBUG)
```

## Request Headers & Tracing

The SDK standardizes outbound request headers and retry behavior.

- Accept: Always `application/json` (session default)
- User-Agent: `hla-compass-sdk/<version> python/<version> os/<platform>`
- Content-Type: `application/json` for JSON requests (per-call)
- X-Request-Id: Unique UUID per HTTP attempt (changes on retry)
- X-Correlation-Id: Optional global correlation ID for cross-service tracing
  - Set environment variable `HLA_CORRELATION_ID` to propagate a stable value
  - Added to both SDK auth requests and APIClient requests
- Idempotency-Key: Added to POST requests to make retries safe
  - A stable UUID is generated per SDK call and reused across retry attempts
  - Server-side should de-duplicate requests using this key

### Retry Policy

- GET/HEAD/OPTIONS: Retries enabled (connection/read) with exponential backoff via `urllib3.Retry`
- 401 Unauthorized: One automatic token refresh (`Auth.refresh_token`) and retry
- 429 Too Many Requests: Honors `Retry-After` (falls back to exponential backoff)
- 5xx Server Errors: Exponential backoff; for POST, retries only when an `Idempotency-Key` is present (the SDK adds one per call)
- Timeouts/Network errors: Limited retries with backoff for transient connection issues

Tip: For end-to-end traceability, set `HLA_CORRELATION_ID` in your environment and pass it across your systems.

## Deployment Note: SimpleDeployer (Deprecated / Dev-only)

The legacy `hla_compass.deployer.SimpleDeployer` utility is disabled by default and intended only for local development experiments. It bypasses platform authentication and infrastructure pipelines.

- Disabled by default; to enable explicitly set `HLA_ENABLE_SIMPLE_DEPLOYER=1`.
- Recommended path: use `hla-compass build` + `hla-compass publish/deploy` and the platform’s Serverless/Terraform pipelines.

## Support

- **Complete Guide**: [Developer Guide](https://docs.alithea.bio/guides/module-developer)
- **Examples**: [Module Templates](https://github.com/AlitheaBio/HLA-Compass-platform/tree/main/sdk/module-ui-template)
- **Issues**: [GitHub Issues](https://github.com/AlitheaBio/HLA-Compass-platform/issues)
- **Quick Start**: [Quick Start](https://docs.alithea.bio/guides/module-developer#quick-start)

## License

Copyright © 2024 Alithea Bio. All rights reserved.


## Advanced Topics

### Data Access (SQL & Storage)

Modules access data via the `DataClient`, which provides scoped access to specific data catalogs.

```python
from hla_compass.module import Module

class AnalysisModule(Module):
    def execute(self, input_data, context):
        # 1. Execute SQL (Secure, Row-Level Scoped)
        # Queries are executed against the schema for the configured catalog (default: immunopeptidomics)
        query = "SELECT sequence, mass FROM peptides WHERE length > %s LIMIT 10"
        result = self.data.sql.query(query, params=[input_data.get("min_length", 8)])
        
        # 2. Access Storage (Direct S3)
        # Reads directly from the organization's S3 bucket prefix
        # Supports 'polars' or 'pandas' engines
        df = self.data.storage.read_parquet("runs/latest/results.parquet", engine="polars")
        
        return {
            "peptides": result["data"],
            "stats": df.describe().to_dict()
        }
```

### Handling Large Datasets (Streaming Iterators)
When working with large result sets via the APIClient (outside module context), you can still use the client directly:

Example: Iterating peptides in pages of 1,000 and stopping after 100k results
```python
from hla_compass.client import APIClient

client = APIClient()

count = 0
for peptide in client.iter_peptides(
    {"hla_allele": "HLA-A*02:01"},
    page_size=1000,
    max_results=100_000,
):
    # process peptide incrementally
    count += 1

print("processed", count)
```

Inside a Module, you can access the underlying API client from the data helper to stream safely during execute:
```python
from hla_compass.module import Module

class StreamModule(Module):
    def execute(self, input_data, context):
        assert self.peptides is not None and self.peptides.api is not None

        top = []
        for pep in self.peptides.api.iter_peptides({"sequence": input_data.get("sequence", "%")}, page_size=2000):
            # Example: collect just a small sample to keep memory bounded
            if len(top) < 5000:
                top.append({"sequence": pep["sequence"], "mass": pep.get("mass")})
            else:
                break
        return top
```

Tips:
- Prefer iter_peptides/iter_proteins/iter_samples for bulk processing.
- Use max_results to enforce hard caps in long-running jobs.
- For DataFrame needs, accumulate to CSV via Storage.save_csv in chunks instead of building a giant list in memory.

### Telemetry Callbacks for APIClient
Capture per-request metrics (status, duration, retries, idempotency key presence) with a callback to ease troubleshooting.

```python
import logging
from hla_compass.client import APIClient

log = logging.getLogger(__name__)

def on_api(data):
    # data contains: method, url, status_code, duration_seconds, attempt, max_retries,
    #                rate_limit_wait, params (redacted in logs), idempotency_key, request_id, error
    log.info("API %s %s %.3fs", data["status_code"], data["url"], data["duration_seconds"]) 

client = APIClient()
client.set_telemetry_callback(on_api)
client.get_peptide("pep_123")
print(client.get_rate_limit_state())  # {'max_requests': ..., 'time_window': ..., 'last_wait_seconds': ...}
```

Environment knobs:
- HLA_RATE_LIMIT_MAX_REQUESTS, HLA_RATE_LIMIT_TIME_WINDOW to tune rate limits
- HLA_CORRELATION_ID to propagate a stable X-Correlation-Id across requests

### Auto-generated API Reference (pdoc)
- Browse: [SDK Documentation](https://docs.alithea.bio/sdk/python)
- Build locally:

```bash
bash scripts/docs-generate-python-sdk.sh
open docs/sdk/python/index.html  # macOS, or use your browser to open the file
```

The script sets PYTHONPATH to include sdk/python and generates docs for the hla_compass package using pdoc.
