Metadata-Version: 2.4
Name: smart-omop
Version: 2.0.2
Summary: OHDSI OMOP CDM data fetching and cohort management for healthcare AI
Author-email: Ankur Lohachab <ankur.lohachab@maastrichtuniversity.nl>
Maintainer-email: Ankur Lohachab <ankur.lohachab@maastrichtuniversity.nl>
License: MIT
Project-URL: Homepage, https://github.com/ankurlohachab/smart-omop
Project-URL: Documentation, https://github.com/ankurlohachab/smart-omop#readme
Project-URL: Repository, https://github.com/ankurlohachab/smart-omop
Project-URL: Issues, https://github.com/ankurlohachab/smart-omop/issues
Keywords: omop,ohdsi,healthcare,clinical-data,cohort-definition,webapi,cdm,observational-health
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Healthcare Industry
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.28.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: mypy>=0.990; extra == "dev"
Requires-Dist: ruff>=0.0.290; extra == "dev"
Provides-Extra: viz
Requires-Dist: plotly>=5.0.0; extra == "viz"
Requires-Dist: matplotlib>=3.5.0; extra == "viz"
Provides-Extra: medsynth
Requires-Dist: medsynth>=0.1.0; extra == "medsynth"
Provides-Extra: all
Requires-Dist: plotly>=5.0.0; extra == "all"
Requires-Dist: matplotlib>=3.5.0; extra == "all"
Requires-Dist: medsynth>=0.1.0; extra == "all"
Dynamic: license-file

# smart-omop

Python client for OHDSI OMOP Common Data Model cohort management via WebAPI.

## Installation

```bash
pip install smart-omop
```

Optional dependencies:

```bash
pip install smart-omop[viz]         # Plotly visualizations
pip install smart-omop[medsynth]    # MedSynth CSV integration
pip install smart-omop[all]         # All features
```

## Quick Start

```python
from smart_omop import OMOPClient

client = OMOPClient("http://your-server:8080/WebAPI")

# List data sources
sources = client.get_sources()

# Generate cohort
client.generate_cohort(cohort_id=1, source_key="MY_CDM")

# Get results
results = client.get_cohort_results(cohort_id=1, source_key="MY_CDM")
print(f"Persons: {results['personCount']}, Status: {results['status']}")
```

## Creating Cohorts

### Basic Cohort Builder

```python
from smart_omop import CohortBuilder, Gender, OMOPClient

builder = CohortBuilder("COPD Patients", "COPD diagnosis cohort")
builder.with_condition("COPD", [255573])  # COPD concept ID
builder.with_age_range(min_age=40)
builder.with_gender(Gender.FEMALE)

cohort_def = builder.build()

with OMOPClient("http://your-server:8080/WebAPI") as client:
    created = client.create_cohort(cohort_def.to_dict())
    print(f"Created cohort ID: {created['id']}")
```

### Multi-Criteria Cohort Builder

```python
from smart_omop import CohortBuilderFull, Gender, AgeOperator

builder = CohortBuilderFull("COPD with Hypertension", "Multi-condition cohort")

# Add concept sets
copd = builder.add_concept_set("COPD")
copd.add_concept(255573, "Chronic obstructive lung disease", include_descendants=True)

htn = builder.add_concept_set("Hypertension")
htn.add_concept(316866, "Hypertensive disorder", include_descendants=True)

# Set primary criterion
builder.add_primary_condition(concept_set_id=0)

# Add inclusion rule for demographics
rule = builder.add_inclusion_rule("Age and Gender")
rule.add_age_criterion(AgeOperator.GTE, 60)
rule.add_age_criterion(AgeOperator.LTE, 85)
rule.add_gender_criterion(Gender.FEMALE)

# Build and create
cohort_def = builder.build()
created = client.create_cohort(cohort_def.to_dict())
```

Supported criteria types: `ConditionOccurrence`, `ProcedureOccurrence`, `DrugExposure`, `Measurement`, `Observation`, `VisitOccurrence`, `DeviceExposure`, `Death`.

## Heracles Characterization

> Examples below use the demo cohort ID `167` and source key `KAGGLECOPD`. Replace both with your own values from your WebAPI/Atlas instance.

### Running Analysis

```python
from smart_omop import HeraclesJobManager, OMOPClient

with OMOPClient("http://your-server:8080/WebAPI") as client:
    manager = HeraclesJobManager(client)

    # Create job
    job = manager.create_job(
        cohort_ids=[167],
        source_key="KAGGLECOPD",
        job_name="COPD_analysis",
        analysis_ids=[1, 2, 3, 4, 5, 400, 401, 402, 403, 404],
        small_cell_count=5
    )

    # Submit (no polling - check status separately)
    result = manager.submit_job(job)
    print(f"Job submitted: {result.get('executionId')}")
```

### Fetching Reports

After Heracles analysis completes, fetch characterization reports:

```python
# Get person demographics
person_report = client.get_heracles_person_report(167, "KAGGLECOPD", refresh=True)

# Get condition occurrences
condition_report = client.get_heracles_condition_report(167, "KAGGLECOPD", refresh=True)

# Get drug exposures
drug_report = client.get_heracles_drug_report(167, "KAGGLECOPD", refresh=True)

# Get procedures
procedure_report = client.get_heracles_procedure_report(167, "KAGGLECOPD", refresh=True)

# Get measurements
measurement_report = client.get_heracles_measurement_report(167, "KAGGLECOPD", refresh=True)

# Get dashboard summary
dashboard = client.get_heracles_dashboard_report(167, "KAGGLECOPD", refresh=True)
```

Available report types:
- `get_heracles_person_report()` - Demographics and year of birth
- `get_heracles_condition_report()` - Condition occurrences
- `get_heracles_drug_report()` - Drug exposures
- `get_heracles_procedure_report()` - Procedures
- `get_heracles_measurement_report()` - Measurements
- `get_heracles_observation_report()` - Observations
- `get_heracles_death_report()` - Death records
- `get_heracles_dashboard_report()` - Summary statistics

### Example: Person Report Data

```python
person_report = client.get_heracles_person_report(167, "KAGGLECOPD", refresh=True)

# Gender distribution
for gender in person_report['gender']:
    print(f"{gender['conceptName']}: {gender['countValue']} persons")
# Output:
# MALE: 61 persons
# FEMALE: 34 persons

# Year of birth distribution
birth_years = person_report['yearOfBirth']
print(f"Birth year entries: {len(birth_years)}")
# Output: Birth year entries: 33

# Birth year statistics
stats = person_report['yearOfBirthStats'][0]
print(f"Year range: {stats['minValue']} to {stats['maxValue']}")
# Output: Year range: 1933 to 1977
```

### Example: Condition Report Data

```python
condition_report = client.get_heracles_condition_report(167, "KAGGLECOPD", refresh=True)

# Top conditions by prevalence
for condition in condition_report[:5]:
    concept_id = condition['conceptId']
    name = condition['conceptPath'].split('||')[-1]
    num_persons = condition['numPersons']
    percent = condition['percentPersons']

    print(f"{name} ({concept_id}): {num_persons} persons ({percent:.1%})")

# Output:
# Chronic obstructive pulmonary disease (255573): 95 persons (100.0%)
# Moderate chronic obstructive pulmonary disease (4193588): 39 persons (41.1%)
# Severe chronic obstructive pulmonary disease (4209097): 27 persons (28.4%)
# Mild chronic obstructive pulmonary disease (4196712): 21 persons (22.1%)
```

## Visualizations

> Visualization examples use the demo cohort ID `167` and source key `KAGGLECOPD`; swap these for your own values.

Create visualizations from Heracles reports:

```python
from smart_omop import CohortVisualizer
import json

# Load report data
with open('cohort167_KAGGLECOPD_person.json') as f:
    person_data = json.load(f)

with open('cohort167_KAGGLECOPD_condition.json') as f:
    condition_data = json.load(f)

visualizer = CohortVisualizer(output_dir="./visualizations")

# Age distribution from Heracles person report
age_by_gender = visualizer.create_age_distribution(person_data)

# Gender distribution
gender_chart = visualizer.create_gender_distribution(person_data)

# Condition prevalence treemap
condition_treemap = visualizer.create_condition_prevalence(condition_data)

# Dashboard with multiple charts
dashboard = visualizer.create_dashboard_from_reports({
    'person': person_data,
    'condition': condition_data
})
```

Outputs interactive HTML files using Plotly.

## CLI Usage

### Creating and Generating Cohorts

```bash
# Create cohort
smart-omop --base-url http://your-server:8080/WebAPI create-cohort \
  --name "COPD Patients" \
  --concept-ids 255573 \
  --age-gte 40 \
  --output cohort.json

# Generate cohort
smart-omop --base-url http://your-server:8080/WebAPI generate \
  --cohort-id 167 \
  --source-key KAGGLECOPD

# Check results
smart-omop --base-url http://your-server:8080/WebAPI results \
  --cohort-id 167 \
  --source-key KAGGLECOPD
```

### Running Heracles and Fetching Reports

```bash
# Run Heracles analysis
smart-omop --base-url http://your-server:8080/WebAPI heracles \
  --cohort-id 167 \
  --source-key KAGGLECOPD \
  --job-name "COPD_analysis" \
  --analysis-ids "1,2,3,4,5,400,401,402,403,404"

# Get individual reports (wait ~60 seconds after Heracles starts)
# Person demographics
smart-omop --base-url http://your-server:8080/WebAPI get-report \
  --cohort-id 167 \
  --source-key KAGGLECOPD \
  --type person \
  --output person.json \
  --refresh

# Dashboard
smart-omop --base-url http://your-server:8080/WebAPI get-report \
  --cohort-id 167 \
  --source-key KAGGLECOPD \
  --type dashboard \
  --refresh

# Condition occurrences
smart-omop --base-url http://your-server:8080/WebAPI get-report \
  --cohort-id 167 \
  --source-key KAGGLECOPD \
  --type condition \
  --output condition.json \
  --refresh

# Available types: dashboard, person, condition, drug, procedure,
#                  measurement, observation, death, components_summary

# Or export all reports at once
smart-omop --base-url http://your-server:8080/WebAPI export-reports \
  --cohort-id 167 \
  --source-key KAGGLECOPD \
  --output-dir ./reports \
  --refresh
```

Notes:
- `167` and `KAGGLECOPD` are example values from the bundled COPD demo. Replace them with your own cohort ID and `source_key` returned by your WebAPI/Atlas instance.
- The `--refresh` flag forces a fresh pull from WebAPI; drop it if you are reusing cached reports.

## Analysis IDs

Common Heracles analysis categories:

| Category | IDs | Description |
|----------|-----|-------------|
| Demographics | 1-5, 7-9 | Person count, age, gender, race |
| Conditions | 400-413 | Condition occurrences and prevalence |
| Drugs | 700-713 | Drug exposures and durations |
| Procedures | 600-613 | Procedure occurrences |
| Measurements | 1800-1831 | Measurement values and distributions |
| Observations | 800-813 | Observation records |
| Visits | 200-213 | Visit occurrences and types |

Example analysis selection:

```python
# Demographics and conditions
analysis_ids = [1, 2, 3, 4, 5, 400, 401, 402, 403, 404]

# Demographics, conditions, and measurements
analysis_ids = [1, 2, 3, 4, 5, 400, 401, 402, 1800, 1801, 1802, 1803]
```

Or use predefined sets:

```python
from smart_omop import DEMO_ANALYSES, CONDITION_ANALYSES, MEASUREMENT_ANALYSES

analysis_ids = DEMO_ANALYSES + CONDITION_ANALYSES + MEASUREMENT_ANALYSES
```

## MedSynth Integration

MedSynth generates synthetic OMOP CDM data for privacy-preserving research.

Installation:
```bash
pip install medsynth
```

Generate synthetic data:
```bash
medsynth --generate-omop --num-subjects 100 --output-dir ./omop_data/
```

Load and analyze:
```python
from smart_omop import MedSynthOMOPSource

source = MedSynthOMOPSource("./omop_data")

# Filter by condition
persons = source.filter_by_condition([255573])  # COPD

# Apply demographics
filtered = source.filter_by_age_gender(
    persons,
    min_age=60,
    gender_concept_ids=[8532]  # Female
)

# Create summary
summary = source.create_cohort_summary(
    concept_ids=[255573],
    min_age=60,
    gender_concept_ids=[8532]
)

print(f"Matching persons: {summary['person_count']}")
```

For more information: https://github.com/ankurlohachab/medsynth

## API Reference

### OMOPClient

Core methods:
- `get_sources()` - List available data sources
- `get_cohort(cohort_id)` - Retrieve cohort definition
- `create_cohort(definition)` - Create new cohort
- `generate_cohort(cohort_id, source_key)` - Generate cohort
- `get_generation_status(cohort_id, source_key)` - Check generation status
- `get_cohort_results(cohort_id, source_key)` - Get cohort summary
- `get_heracles_report(cohort_id, source_key, report_type, refresh)` - Get specific report
- `get_heracles_person_report(cohort_id, source_key, refresh)` - Get demographics
- `get_heracles_condition_report(cohort_id, source_key, refresh)` - Get conditions
- `get_heracles_drug_report(cohort_id, source_key, refresh)` - Get drug exposures
- `get_heracles_procedure_report(cohort_id, source_key, refresh)` - Get procedures
- `get_heracles_measurement_report(cohort_id, source_key, refresh)` - Get measurements
- `get_heracles_dashboard_report(cohort_id, source_key, refresh)` - Get dashboard

### HeraclesJobManager

Methods:
- `create_job(cohort_ids, source_key, job_name, analysis_ids, small_cell_count)` - Create job
- `submit_job(job_config, poll, timeout)` - Submit job
- `get_job_status(execution_id)` - Check job status

### CohortBuilder

Methods:
- `with_condition(name, concept_ids)` - Add condition criterion
- `with_age_range(min_age, max_age)` - Set age requirements
- `with_gender(gender)` - Set gender requirement
- `build()` - Generate cohort definition

### CohortVisualizer

Methods:
- `create_age_distribution(person_report)` - Age distribution from Heracles
- `create_gender_distribution(person_report)` - Gender breakdown
- `create_condition_prevalence(condition_report)` - Condition treemap
- `create_dashboard_from_reports(reports)` - Multi-panel dashboard

## Configuration

Custom timeout and retries:

```python
client = OMOPClient(
    "http://your-server:8080/WebAPI",
    timeout=60,
    max_retries=5,
    verify_ssl=True
)
```

Environment variable:
```bash
export OMOP_BASE_URL="http://your-server:8080/WebAPI"
```

## Testing

Run tests:
```bash
pytest tests/ -v
```

Example test output:
```
tests/test_client.py::test_client_initialization PASSED
tests/test_cohort.py::test_simple_cohort_builder PASSED
tests/test_heracles.py::test_heracles_job_config PASSED
tests/test_error_handling.py::test_invalid_cohort_id PASSED

29 passed in 6.91s
```

Tested against OHDSI WebAPI 2.14.0.

## Requirements

- Python 3.9+
- OHDSI WebAPI instance (v2.7+) or MedSynth CSV data

## Examples

See `examples/` directory:
- `example_quickstart.py` - Basic operations
- `example_simple_cohort.py` - Cohort building
- `example_heracles.py` - Characterization
- `example_heracles_reports.py` - Report fetching
- `example_visualizations.py` - Visualizations
- `example_medsynth.py` - Synthetic data

## Development

```bash
git clone https://github.com/ankurlohachab/smart-omop.git
cd smart-omop

pip install -e ".[dev]"

pytest
mypy src/smart_omop
black src/smart_omop
```

## Author

Ankur Lohachab
Department of Advanced Computing Sciences, Maastricht University

## License

MIT License - see LICENSE file.

## Citation

```bibtex
@software{lohachab2025smartomop,
  author = {Lohachab, Ankur},
  title = {smart-omop: OHDSI OMOP CDM Client for Python},
  year = {2025},
  url = {https://github.com/ankurlohachab/smart-omop}
}
```

## Support

Issues: https://github.com/ankurlohachab/smart-omop/issues
Email: ankur.lohachab@maastrichtuniversity.nl
