Metadata-Version: 2.4
Name: smart-omop
Version: 2.0.0
Summary: OHDSI OMOP CDM data fetching and cohort management for healthcare AI
Author-email: Ankur Lohachab <ankur.lohachab@maastrichtuniversity.nl>
Maintainer-email: Ankur Lohachab <ankur.lohachab@maastrichtuniversity.nl>
License: MIT
Project-URL: Homepage, https://github.com/ankurlohachab/smart-omop
Project-URL: Documentation, https://github.com/ankurlohachab/smart-omop#readme
Project-URL: Repository, https://github.com/ankurlohachab/smart-omop
Project-URL: Issues, https://github.com/ankurlohachab/smart-omop/issues
Keywords: omop,ohdsi,healthcare,clinical-data,cohort-definition,webapi,cdm,observational-health
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Healthcare Industry
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.28.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: mypy>=0.990; extra == "dev"
Requires-Dist: ruff>=0.0.290; extra == "dev"
Provides-Extra: viz
Requires-Dist: plotly>=5.0.0; extra == "viz"
Requires-Dist: matplotlib>=3.5.0; extra == "viz"
Provides-Extra: medsynth
Requires-Dist: medsynth>=0.1.0; extra == "medsynth"
Provides-Extra: all
Requires-Dist: plotly>=5.0.0; extra == "all"
Requires-Dist: matplotlib>=3.5.0; extra == "all"
Requires-Dist: medsynth>=0.1.0; extra == "all"
Dynamic: license-file

# smart-omop

Python client for OHDSI OMOP Common Data Model cohort management via WebAPI.

## Features

- Cohort definition creation with CIRCE expression syntax
- Cohort generation and results retrieval
- Heracles characterization with configurable analysis sets
- Concept set management and resolution
- MedSynth synthetic data integration (CSV-based)
- Interactive visualizations (Plotly/matplotlib)
- CLI and Python API

## Installation

```bash
pip install smart-omop
```

Optional dependencies:

```bash
pip install smart-omop[viz]         # Plotly visualizations
pip install smart-omop[medsynth]    # MedSynth integration
pip install smart-omop[all]         # All features
```

## Examples

The `examples/` directory contains standalone example scripts demonstrating key features:

- `example_quickstart.py` - Basic client operations
- `example_simple_cohort.py` - Simple cohort building
- `example_circe_syntax.py` - Full CIRCE syntax
- `example_heracles.py` - Heracles characterization
- `example_medsynth.py` - MedSynth CSV data integration
- `example_visualizations.py` - Interactive visualizations

See `examples/README.md` for details on running examples.

## Quick Start

```python
from smart_omop import OMOPClient

client = OMOPClient("http://your-webapi:8080/WebAPI")

# List data sources
sources = client.get_sources()

# Fetch cohort definition
cohort = client.get_cohort(cohort_id=1)

# Generate cohort
client.generate_cohort(cohort_id=1, source_key="MY_CDM")

# Get results
results = client.get_cohort_results(cohort_id=1, source_key="MY_CDM")
print(f"Persons: {results['personCount']}, Status: {results['status']}")
```

## Cohort Building

### Example 1

```python
from smart_omop import CohortBuilder, Gender

builder = CohortBuilder("COPD Patients", "COPD diagnosis cohort")
builder.with_condition("COPD", [255573, 40481087])
builder.with_age_range(min_age=40)
builder.with_gender(Gender.FEMALE)

cohort_def = builder.build()

with OMOPClient("http://your-webapi:8080/WebAPI") as client:
    created = client.create_cohort(cohort_def.to_dict())
```

### Example 2

```python
from smart_omop import CohortBuilderFull, Gender, AgeOperator

builder = CohortBuilderFull("Complex Cohort", "Multiple criteria")

# Concept sets
copd = builder.add_concept_set("COPD")
copd.add_concept(255573, "Chronic obstructive lung disease", include_descendants=True)

htn = builder.add_concept_set("Hypertension")
htn.add_concept(316866, "Hypertensive disorder", include_descendants=True)

# Primary criterion
builder.add_primary_condition(concept_set_id=0)

# Observation window
builder.set_observation_window(prior_days=365, post_days=0)

# Inclusion rules
rule = builder.add_inclusion_rule("Demographics")
rule.add_age_criterion(AgeOperator.GTE, 60)
rule.add_age_criterion(AgeOperator.LTE, 85)
rule.add_gender_criterion(Gender.FEMALE)

cohort_def = builder.build()
```

Supported primary criteria types: `ConditionOccurrence`, `ProcedureOccurrence`, `DrugExposure`, `Measurement`, `Observation`, `VisitOccurrence`, `DeviceExposure`, `Death`.

## Heracles Characterization

```python
from smart_omop import HeraclesJobManager, HeraclesAnalysisBuilder

with OMOPClient("http://your-webapi:8080/WebAPI") as client:
    mgr = HeraclesJobManager(client)

    # Build analysis set
    analyses = HeraclesAnalysisBuilder()
    analyses.add_demographics()
    analyses.add_conditions()
    analyses.add_drugs()

    # Create job
    job = mgr.create_job(
        cohort_ids=[1],
        source_key="MY_CDM",
        job_name="COPD_Characterization",
        analysis_ids=analyses.build(),
        small_cell_count=5
    )

    # Submit
    result = mgr.submit_job(job, poll=True, timeout=1800)
```

Job configuration format:
```json
{
  "jobName": "COPD_Characterization",
  "sourceKey": "MY_CDM",
  "smallCellCount": 5,
  "cohortDefinitionIds": [1],
  "analysisIds": [1, 2, 3, 400, 401, ...],
  "runHeraclesHeel": false,
  "cohortPeriodOnly": false
}
```

Analysis categories: `DEMO_ANALYSES`, `CONDITION_ANALYSES`, `DRUG_ANALYSES`, `PROCEDURE_ANALYSES`, `MEASUREMENT_ANALYSES`, `VISIT_ANALYSES`, `OBSERVATION_ANALYSES`.

## Data Sources

### WebAPI Instance

```python
from smart_omop import OMOPClient, fetch_cohort_data

# Any OHDSI WebAPI instance
data = fetch_cohort_data(
    "http://your-webapi:8080/WebAPI",
    cohort_id=1,
    source_key="MY_CDM",
    include_results=True
)
```

### MedSynth CSV Data

MedSynth is a medical synthetic data generator that creates privacy-preserving OMOP CDM datasets. It generates CT scans and OMOP-formatted CSV files using statistical methods.

Installation:
```bash
pip install medsynth
```

Generate synthetic OMOP data:
```bash
medsynth --generate-omop --num-subjects 100 --output-dir ./omop_data/
```

Load and filter MedSynth-generated data:
```python
from smart_omop import MedSynthOMOPSource, Gender

source = MedSynthOMOPSource("/path/to/medsynth/output")

# Filter by condition
persons = source.filter_by_condition([255573])

# Apply demographics
filtered = source.filter_by_age_gender(
    persons,
    min_age=60,
    gender_concept_ids=[Gender.FEMALE.value]
)

# Create summary
summary = source.create_cohort_summary(
    concept_ids=[255573],
    min_age=60,
    gender_concept_ids=[Gender.FEMALE.value]
)
```

For more information: https://github.com/ankurlohachab/medsynth

Supported OMOP tables: `person`, `condition_occurrence`, `drug_exposure`, `procedure_occurrence`, `measurement`, `observation`, `visit_occurrence`, `death`.

## Visualizations

```python
from smart_omop import CohortVisualizer

visualizer = CohortVisualizer(output_dir="./viz")

# Age distribution
age_data = {
    'male': [45, 52, 61, 67, 72, ...],
    'female': [48, 55, 59, 64, 70, ...]
}
age_path = visualizer.create_age_pyramid(age_data)

# Condition prevalence
condition_counts = {'255573': 100, '316866': 45}
condition_names = {'255573': 'COPD', '316866': 'Hypertension'}
treemap_path = visualizer.create_condition_treemap(condition_counts, condition_names)

# Dashboard
dashboard_path = visualizer.create_dashboard(cohort_data)
```

Outputs interactive HTML files using Plotly. Falls back to matplotlib if Plotly unavailable.

## CLI

```bash
# Create cohort
smart-omop --base-url http://your-webapi:8080/WebAPI create-cohort \
  --name "COPD Cohort" \
  --concept-ids 255573,40481087 \
  --age-gte 40 \
  --gender female

# Generate cohort
smart-omop --base-url http://your-webapi:8080/WebAPI generate \
  --cohort-id 1 \
  --source-key MY_CDM

# Fetch results
smart-omop --base-url http://your-webapi:8080/WebAPI results \
  --cohort-id 1 \
  --source-key MY_CDM \
  --output results.json
```

## Configuration

Environment variables:

```bash
export OMOP_BASE_URL="http://your-webapi:8080/WebAPI"
```

Custom timeout and retries:

```python
client = OMOPClient(
    "http://your-webapi:8080/WebAPI",
    timeout=60,
    max_retries=5,
    verify_ssl=True
)
```

## API Reference

### OMOPClient

Core client for WebAPI interactions.

Methods:
- `get_sources()` - List available data sources
- `get_cohort(cohort_id)` - Fetch cohort definition
- `create_cohort(definition)` - Create new cohort
- `generate_cohort(cohort_id, source_key)` - Generate cohort on source
- `get_generation_status(cohort_id, source_key)` - Check generation status
- `get_cohort_results(cohort_id, source_key)` - Fetch cohort summary
- `get_heracles_analyses(cohort_id, source_key)` - Fetch Heracles analyses
- `run_heracles(cohort_id, source_key)` - Run characterization
- `get_concept_set(concept_set_id)` - Fetch concept set
- `resolve_concept_set(expression, source_key)` - Resolve to concept IDs

### CohortBuilder

Fluent interface for cohort definitions.

Methods:
- `with_condition(name, concept_ids)` - Add condition criterion
- `with_age_range(min_age, max_age)` - Set age requirements
- `with_gender(gender)` - Set gender requirement
- `with_observation_window(prior_days, post_days)` - Set observation window
- `build()` - Generate cohort definition


Expression syntax support.

Methods:
- `add_concept_set(name)` - Create concept set
- `add_primary_condition(concept_set_id)` - Add condition criterion
- `add_primary_procedure(concept_set_id)` - Add procedure criterion
- `add_primary_drug(concept_set_id)` - Add drug criterion
- `add_primary_measurement(concept_set_id)` - Add measurement criterion
- `set_observation_window(prior_days, post_days)` - Set observation window
- `set_primary_criteria_limit(limit_type)` - Set limit type (All, First)
- `add_inclusion_rule(name, description)` - Add inclusion rule
- `build()` - Generate cohort definition

### HeraclesJobManager

Heracles job management.

Methods:
- `create_job(cohort_ids, source_key, ...)` - Create job configuration
- `submit_job(job_config, poll, timeout)` - Submit and optionally poll
- `get_job_status(execution_id)` - Get job status

### MedSynthOMOPSource

CSV-based OMOP data source.

Methods:
- `load_table(table_name)` - Load OMOP table from CSV
- `get_person_count()` - Get total persons
- `get_condition_counts()` - Get condition counts by concept ID
- `filter_by_condition(concept_ids)` - Filter persons by condition
- `filter_by_age_gender(person_ids, min_age, max_age, gender_concept_ids)` - Apply demographics
- `create_cohort_summary(concept_ids, min_age, max_age, gender_concept_ids)` - Create summary

### CohortVisualizer

Visualization generator.

Methods:
- `create_age_pyramid(age_data, save_path)` - Age distribution by gender
- `create_condition_treemap(condition_counts, condition_names, save_path)` - Condition prevalence
- `create_temporal_pattern(dates, save_path)` - Cohort entry over time
- `create_dashboard(cohort_data, save_path)` - Comprehensive dashboard

### High-Level Functions

- `fetch_cohort_data(base_url, cohort_id, source_key)` - Complete cohort data
- `fetch_concept_sets(base_url, concept_set_ids)` - Multiple concept sets
- `create_and_generate_cohort(base_url, cohort_name, concept_ids, source_key, age_min)` - Create and generate
- `poll_generation_status(base_url, cohort_id, source_key, max_wait)` - Poll until complete
- `create_simple_cohort(name, description, concept_ids, include_descendants, age_min, age_max, genders)` - Simple cohort
- `create_standard_job(cohort_ids, source_key, job_name, ...)` - Standard Heracles job
- `load_from_medsynth_directory(data_directory, concept_ids, min_age, max_age, gender_concept_ids)` - Load from CSV
- `create_cohort_visualizations(cohort_data, output_dir)` - All visualizations

## Testing

Run test suite:

```bash
pytest tests/ -v
```

Test results:

```
tests/test_client.py::test_client_initialization ......... PASSED
tests/test_client.py::test_context_manager ............... PASSED
tests/test_client.py::test_get_sources ................... PASSED
tests/test_client.py::test_get_cohort .................... PASSED
tests/test_client.py::test_get_generation_status ......... PASSED
tests/test_client.py::test_get_cohort_results ............ PASSED
tests/test_cohort.py::test_simple_cohort_builder ......... PASSED
tests/test_cohort.py::test_full_cohort_builder ........... PASSED
tests/test_cohort.py::test_create_simple_cohort .......... PASSED
tests/test_cohort.py::test_multiple_concept_sets ......... PASSED
tests/test_cohort.py::test_age_operators ................. PASSED
tests/test_error_handling.py::test_invalid_cohort_id ..... PASSED
tests/test_error_handling.py::test_nonexistent_cohort .... PASSED
tests/test_error_handling.py::test_invalid_source_key .... PASSED
tests/test_error_handling.py::test_nonexistent_source .... PASSED
tests/test_error_handling.py::test_cohort_not_generated .. PASSED
tests/test_error_handling.py::test_invalid_cohort_definition PASSED
tests/test_error_handling.py::test_empty_concept_set ..... PASSED
tests/test_error_handling.py::test_no_primary_criteria ... PASSED
tests/test_error_handling.py::test_invalid_concept_id .... PASSED
tests/test_error_handling.py::test_medsynth_invalid_directory PASSED
tests/test_error_handling.py::test_medsynth_invalid_table . PASSED
tests/test_error_handling.py::test_medsynth_missing_table_file PASSED
tests/test_heracles.py::test_heracles_job_config ......... PASSED
tests/test_heracles.py::test_analysis_builder ............ PASSED
tests/test_heracles.py::test_analysis_categories ......... PASSED
tests/test_heracles.py::test_create_standard_job ......... PASSED
tests/test_heracles.py::test_custom_analyses ............. PASSED
tests/test_heracles.py::test_comprehensive_analyses ...... PASSED

29 passed in 6.91s
```

Tested against OHDSI WebAPI 2.14.0 with KAGGLECOPD and SYNPUF1K data sources.

## Requirements

- Python 3.9+
- OHDSI WebAPI instance (v2.7+) or MedSynth CSV data
- Network access to WebAPI endpoint (if using WebAPI)

## Development

```bash
git clone https://github.com/ankurlohachab/smart-omop.git
cd smart-omop

pip install -e ".[dev]"

pytest
mypy src/smart_omop
black src/smart_omop
```

## Author

Ankur Lohachab
Department of Advanced Computing Sciences
Maastricht University

## License

MIT License - see LICENSE file.

## Citation

```bibtex
@software{lohachab2025smartomop,
  author = {Lohachab, Ankur},
  title = {smart-omop: OHDSI OMOP CDM Data Fetching for Healthcare AI},
  year = {2025},
  url = {https://github.com/ankurlohachab/smart-omop}
}
```

## Support

Issues: https://github.com/ankurlohachab/smart-omop/issues
Email: ankur.lohachab@maastrichtuniversity.nl
