Metadata-Version: 2.4
Name: casa-pclean
Version: 0.2.3
Summary: Parallel CLEAN imaging using Dask and CASA tools
Author-email: MiCASA <rx.astro@gmail.com>
License-Expression: GPL-3.0-or-later
Project-URL: Homepage, https://github.com/r-xue/pclean
Project-URL: Repository, https://github.com/r-xue/pclean
Project-URL: Issues, https://github.com/r-xue/pclean/issues
Project-URL: Documentation, https://pclean.readthedocs.io/
Project-URL: Changelog, https://github.com/r-xue/pclean/releases
Keywords: astronomy,pipeline,ALMA,VLA,radio-astronomy,CASA,imaging
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Astronomy
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Operating System :: OS Independent
Classifier: Typing :: Typed
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: dask
Requires-Dist: distributed
Requires-Dist: numpy
Requires-Dist: pydantic>=2.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: scipy
Provides-Extra: casa
Requires-Dist: casatools>=6.5; extra == "casa"
Requires-Dist: casatasks>=6.5; extra == "casa"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Provides-Extra: slurm
Requires-Dist: dask-jobqueue>=0.8; extra == "slurm"
Provides-Extra: docs
Requires-Dist: sphinx>=7.0; extra == "docs"
Requires-Dist: furo; extra == "docs"
Requires-Dist: myst-parser>=2.0; extra == "docs"
Requires-Dist: sphinx-copybutton; extra == "docs"
Requires-Dist: sphinxcontrib-mermaid; extra == "docs"
Dynamic: license-file

# pclean — Parallel CLEAN Imaging with Dask

[![tests](https://img.shields.io/github/actions/workflow/status/r-xue/pclean/test.yml?branch=main&logo=github&label=tests)](https://github.com/r-xue/pclean/actions/workflows/test.yml)
[![codecov](https://img.shields.io/codecov/c/github/r-xue/pclean?logo=codecov)](https://codecov.io/gh/r-xue/pclean)
[![docs](https://img.shields.io/readthedocs/pclean?logo=readthedocs&label=docs)](https://pclean.readthedocs.io/)

`pclean` is a modular, Dask-accelerated radio-interferometric imaging package
that wraps CASA's synthesis imaging C++ tools (`casatools`) to provide
transparent parallelism for **cube** (channel-distributed) and **continuum**
(row-distributed) imaging workflows.

## Features

| Feature | Description |
|---------|-------------|
| **Cube parallelism** | Channels are distributed across Dask workers; each worker runs a complete imaging and deconvolution cycle on its sub-cube. |
| **Continuum parallelism** | Visibility rows are partitioned across Dask workers for major-cycle gridding; minor cycles run on the gathered, normalized image. |
| **tclean-compatible API** | Drop-in `pclean()` function accepting the same parameters as CASA `tclean`. |
| **Hierarchical config** | Pydantic v2 YAML-based configuration with presets, layered merging, and CASA bridge methods. |
| **CLI support** | Run imaging from the command line via `python -m pclean`. |
| **SLURM clusters** | Native Dask-Jobqueue integration for HPC batch scheduling. |
| **Modular internals** | Every building block — imager, deconvolver, normalizer, partitioner, cluster manager — is independently importable. |
| **ADIOS2 support** | Convert MeasurementSet columns to `Adios2StMan` for I/O benchmarking. Requires the `casatools` openmpi variant from conda-forge. |

## Quick start

```python
from pclean import pclean

# Parallel cube imaging (channels distributed across workers)
pclean(
    vis='my.ms',
    imagename='cube_out',
    specmode='cube',
    imsize=[512, 512],
    cell='1arcsec',
    niter=1000,
    deconvolver='hogbom',
    parallel=True,
    nworkers=8,
    cube_chunksize=1,       # one sub-cube per channel (max parallelism)
)

# Parallel continuum imaging (visibility rows chunked)
pclean(
    vis='my.ms',
    imagename='cont_out',
    specmode='mfs',
    imsize=[2048, 2048],
    cell='0.5arcsec',
    niter=5000,
    deconvolver='mtmfs',
    nterms=2,
    parallel=True,
    nworkers=4,
)
```

### Command-line interface

```bash
python -m pclean --vis my.ms --imagename out --specmode cube \
    --imsize 512 512 --cell 1arcsec --niter 1000 \
    --parallel --nworkers 8
```

### Additional parameters

Beyond the standard `tclean` parameters, `pclean` accepts:

| Parameter | Default | Description |
|-----------|---------|-------------|
| `parallel` | `False` | Enable Dask-distributed parallelism. |
| `nworkers` | `None` | Number of Dask workers. `None` defaults to the available CPU count. |
| `scheduler_address` | `None` | Address of an existing Dask scheduler; when set, no local cluster is created. |
| `threads_per_worker` | `1` | Threads per Dask worker. Kept at 1 because CASA tools are not thread-safe. |
| `memory_limit` | `'0'` | Per-worker memory cap. `'0'` disables Dask memory management, preventing CASA C++ allocations from being paused or killed. |
| `local_directory` | `None` | Scratch directory for Dask spill-to-disk. |
| `cube_chunksize` | `-1` | Channels per sub-cube task. `-1` assigns one sub-cube per worker; `1` assigns one per channel. |
| `keep_subcubes` | `False` | Retain intermediate sub-cube images after concatenation. |
| `keep_partimages` | `False` | Retain partial images after continuum gather. |
| `concat_mode` | `'auto'` | Concatenation strategy: `'auto'` (derive from `keep_subcubes`), `'paged'` (physical copy), `'virtual'` (reference catalog), `'movevirtual'` (rename into output). |

## Architecture

```
pclean/
├── src/pclean/
│   ├── __init__.py                # Package init, exposes pclean()
│   ├── __main__.py                # CLI entry point (python -m pclean)
│   ├── pclean.py                  # Top-level tclean-like interface
│   ├── params.py                  # Parameter container & validation
│   ├── imaging/
│   │   ├── serial_imager.py       # Single-process imager (base engine)
│   │   ├── deconvolver.py         # Deconvolution wrapper
│   │   └── normalizer.py          # Image normalization (gather/scatter)
│   ├── parallel/
│   │   ├── cluster.py             # Dask cluster lifecycle management
│   │   ├── cube_parallel.py       # Channel-parallel cube imaging
│   │   ├── continuum_parallel.py  # Row-parallel continuum imaging
│   │   └── worker_tasks.py        # Serialisable functions for workers
│   └── utils/
│       ├── partition.py           # Data / image partitioning helpers
│       ├── image_concat.py        # Sub-cube image concatenation
│       ├── memory_estimate.py     # Worker RAM estimation heuristics
│       ├── check_adios2.py        # Adios2StMan availability check
│       └── convert_adios2.py      # MS → ADIOS2 conversion utility
```

## Documentation

Full documentation is hosted at
**[pclean.readthedocs.io](https://pclean.readthedocs.io/)**.

## Requirements

* Python ≥ 3.10
* `casatools` ≥ 6.5
* `dask` + `distributed`
* `numpy`
* `pydantic` ≥ 2.0

### Pixi environments

The project uses [pixi](https://pixi.sh/) for reproducible environment
management.  Four environments are defined in `pyproject.toml`:

| Environment | Features | Description |
|-------------|----------|-------------|
| `default` | `casa` | Runtime with `casatools`/`casatasks` from PyPI. |
| `default-forge` | `casa-forge` | Runtime with `casatools`/`casatasks` from conda-forge (includes the openmpi variant required for `Adios2StMan`). |
| `dev` | `casa`, `dev` | Runtime plus pytest, pytest-cov, and ruff. |
| `test` | `dev` | Linting and testing only (no `casatools`). |

Common tasks are exposed as pixi scripts:

```bash
pixi run -e dev test          # pytest -v
pixi run -e dev test-cov      # pytest with coverage
pixi run -e dev lint          # ruff check
pixi run -e dev fmt           # ruff format
```

## References and acknowledgements

`pclean` builds on the imaging and calibration infrastructure developed by
the CASA team at NRAO / ESO / NAOJ.  The scientific algorithms — gridding,
deconvolution, self-calibration — are the product of decades of CASA
development; `pclean` is purely a **computing-engineering** effort that
re-orchestrates those mature tools with a modern distributed runtime.

If this package contributes to published research, please cite the CASA
software:

> CASA Team, Bean, B., Bhatnagar, S., et al. 2022,
> "CASA, the Common Astronomy Software Applications for Radio Astronomy,"
> *PASP*, 134, 114501.
> [doi:10.1088/1538-3873/ac9642](https://doi.org/10.1088/1538-3873/ac9642)

> McMullin, J. P., Waters, B., Schiebel, D., Young, W., & Golap, K. 2007,
> "CASA Architecture and Applications,"
> *ASP Conf. Ser.*, 376, 127.
> [ads:2007ASPC..376..127M](https://ui.adsabs.harvard.edu/abs/2007ASPC..376..127M)

### Relation to CASA's built-in parallel imaging

`pclean`'s parallel design closely follows the Python orchestration layer that
CASA's `tclean` task already provides through the
`casatasks.private.imagerhelpers` module:

| CASA Python class | pclean equivalent | role |
| --- | --- | --- |
| `PySynthesisImager` | `SerialImager` | serial imaging loop (init → PSF → major/minor → restore) |
| `PyParallelCubeSynthesisImager` | `ParallelCubeImager` | each worker runs an independent `SerialImager` on a frequency sub-cube |
| `PyParallelContSynthesisImager` | `ParallelContinuumImager` | row-partitioned gridding across workers; minor cycles run serially on the coordinator |
| `PyParallelImagerHelper` | `DaskClusterManager` | cluster lifecycle, job dispatch, and result collection |

The structural decomposition is the same: partition → image → normalize →
deconvolve → iterate, with the same split between embarrassingly-parallel cube
channels and gather/scatter continuum cycles.  Both code-bases use polymorphic
dispatch — `task_tclean.py` picks between `PySynthesisImager`,
`PyParallelCubeSynthesisImager`, or `PyParallelContSynthesisImager` based on
`specmode` and MPI availability; `pclean` makes the same choice based on its
own `parallel` and `is_cube` flags.

The key difference is the **parallelism transport**.  CASA's
`PyParallelImagerHelper` sends Python code strings to MPI workers via
`casampi.MPIInterface`, requiring `mpicasa` and a
shared filesystem.  `pclean` replaces this with
[Dask Distributed](https://distributed.dask.org/) futures and actors,
eliminating the MPI dependency in exchange for Dask scheduling overhead.

See also [CASA Memo 13](https://casadocs.readthedocs.io/en/latest/notebooks/memo-series.html)
(Sekhar, Rau & Xue 2024) for benchmarking of per-channel cube imaging
distributed via SLURM job arrays that motivated this work
([benchmarking scripts](https://github.com/Kitchi/cube_parallelization_benchmarking)).

## License

Copyright 2026 the `pclean` authors.

GPL-3.0-or-later — see [LICENSE](https://github.com/r-xue/pclean/blob/main/LICENSE) for details.

## Disclaimer

This project is an independent, personal effort developed on the authors' own
time.  It is not affiliated with, endorsed by, or conducted as part of any
employer's projects or responsibilities.

## AI Disclosure

This project was developed with the assistance of AI coding agents
(GitHub Copilot, Claude).  The AI contributed to code generation, debugging,
and documentation under human direction and review.
