Metadata-Version: 2.4
Name: euclidkit
Version: 0.2.1
Summary: Euclid Archival Data Analysis Package
Home-page: https://github.com/rudolffu/euclidkit
Author: Yuming Fu
Author-email: Yuming Fu <fuympku@outlook.com>
Maintainer: Yuming Fu
Maintainer-email: Yuming Fu <fuympku@outlook.com>
License: BSD 3-Clause License
        
        Copyright (c) 2026, Yuming Fu
        All rights reserved.
        
        Redistribution and use in source and binary forms, with or without
        modification, are permitted provided that the following conditions are met:
        
        1. Redistributions of source code must retain the above copyright notice, this
           list of conditions and the following disclaimer.
        
        2. Redistributions in binary form must reproduce the above copyright notice,
           this list of conditions and the following disclaimer in the documentation
           and/or other materials provided with the distribution.
        
        3. Neither the name of the copyright holder nor the names of its
           contributors may be used to endorse or promote products derived from
           this software without specific prior written permission.
        
        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
        AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
        IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
        DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
        FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
        DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
        SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
        CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
        OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
        OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
        
Project-URL: Homepage, https://github.com/rudolffu/euclidkit
Project-URL: Documentation, https://euclidkit.readthedocs.io
Project-URL: Repository, https://github.com/rudolffu/euclidkit
Project-URL: Bug Tracker, https://github.com/rudolffu/euclidkit/issues
Keywords: astronomy,euclid,archival-data,catalog-analysis,spectroscopy,photometry
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Astronomy
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.19.0
Requires-Dist: scipy>=1.6.0
Requires-Dist: matplotlib>=3.3.0
Requires-Dist: astropy>=4.0.0
Requires-Dist: mocpy>=0.13.0
Requires-Dist: astroquery>=0.4.0
Requires-Dist: pandas>=1.2.0
Requires-Dist: photutils>=1.0.0
Requires-Dist: sep>=1.2.0
Requires-Dist: scikit-image>=0.18.0
Requires-Dist: tqdm>=4.50.0
Requires-Dist: pyyaml>=5.4.0
Requires-Dist: click>=8.0.0
Requires-Dist: requests>=2.25.0
Provides-Extra: desi
Requires-Dist: sparcl-client>=1.0.0; extra == "desi"
Provides-Extra: visualization
Requires-Dist: jdaviz>=2.0.0; extra == "visualization"
Requires-Dist: bokeh>=2.0.0; extra == "visualization"
Requires-Dist: plotly>=5.0.0; extra == "visualization"
Provides-Extra: dev
Requires-Dist: pytest>=6.0.0; extra == "dev"
Requires-Dist: pytest-cov>=2.10.0; extra == "dev"
Requires-Dist: black>=21.0.0; extra == "dev"
Requires-Dist: isort>=5.0.0; extra == "dev"
Requires-Dist: flake8>=3.8.0; extra == "dev"
Requires-Dist: mypy>=0.800; extra == "dev"
Requires-Dist: pre-commit>=2.10.0; extra == "dev"
Requires-Dist: sphinx>=4.0.0; extra == "dev"
Requires-Dist: sphinx-rtd-theme>=0.5.0; extra == "dev"
Requires-Dist: jupyter>=1.0.0; extra == "dev"
Requires-Dist: ipython>=7.20.0; extra == "dev"
Provides-Extra: complete
Requires-Dist: euclidkit[desi,dev,visualization]; extra == "complete"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: maintainer
Dynamic: requires-python

# euclidkit

[![PyPI version](https://img.shields.io/pypi/v/euclidkit.svg)](https://pypi.org/project/euclidkit/)
[![Read the Docs](https://img.shields.io/readthedocs/euclidkit?label=docs)](https://euclidkit.readthedocs.io/en/latest/index.html)

A comprehensive Python package for Euclid archival data analysis, designed for use within the ESA Datalabs environment.

## Overview

`euclidkit` facilitates advanced data exploration and visualization for Euclid Q1/(I)DR1 archival releases, including:

- **Data Access**: Query and crossmatch sources with the Euclid MER catalogue
- **Spectroscopic Analysis**: Access, download, and combine NISP spectra of archival sources
- **Unified Workflow**: Streamlined tools for researchers working with Euclid spectroscopic data

The package is designed for efficient archive querying and Euclid spectrum compilation workflows.

## Installation

### Requirements

- Python 3.11+
- Access to ESA Datalabs environment (for data volumes)
- COSMOS credentials for Euclid archive access

### Basic Installation

```bash
pip install euclidkit
```

### Development Installation

```bash
git clone https://github.com/rudolffu/euclidkit.git
cd euclidkit
pip install -e .
```

## Quick Start

### Setup Credentials

Store credentials in a private file under your home directory and restrict permissions:
```bash
mkdir -p ~/.euclidkit
touch ~/.euclidkit/.cred.txt
chmod 600 ~/.euclidkit/.cred.txt
```

Edit `~/.euclidkit/.cred.txt` manually with your preferred editor (do not put credentials in shell history).

Use two lines:
1. COSMOS username
2. COSMOS password

### Configuration

Create and edit the user config file:
```bash
euclidkit init-config --output ~/.euclidkit/euclidkit_config.yaml --template basic
```

Then edit `~/.euclidkit/euclidkit_config.yaml` and set the credential path.

Set the credential path in the config:

```yaml
data:
  credentials_file: /home/<user>/.euclidkit/.cred.txt
```

### Basic Usage

```python
# Note: the Python import path is currently still `euclidkit`.
from euclidkit.core.data_access import EuclidArchive

# Initialize archive connection
archive = EuclidArchive(environment='PDR')
archive.login()

# Crossmatch your sources with Euclid MER catalogue
results = archive.crossmatch_sources(
    user_table="my_sources.csv",
    radius=1.0,  # arcseconds
    output_file="crossmatch_results.fits"
)

# Query for available spectra
spectra_table = archive.query_spectra_sources(
    crossmatch_table=results,
    output_file="spectra_sources.fits"
)

# Combine spectra into a single FITS file
combined_file = archive.combine_spectra_to_fits(
    spectra_table=spectra_table,
    output_file="my_combined_spectra.fits"
)
```

## Command Line Interface

### Crossmatching Sources

```bash
# Crossmatch user table with Euclid MER catalogue
euclidkit crossmatch \
    --input my_sources.csv \
    --output crossmatch_results.fits \
    --radius 1.0 \
    --verbose

# Submit the entire table as a single async job (no batching). The output file
# uses async TAP mode; for very large tables euclidkit splits into async chunks.
euclidkit crossmatch \
    --input my_sources.csv \
    --output crossmatch_results.fits \
    --full-async \
    --async-chunk-size 500000

# When using the IDR environment the command defaults to the WIDE field and
# writes results to wide_<filename>. Use --idr-field DEEP to query the deep stack:
euclidkit crossmatch \
    --input my_sources.csv \
    --output crossmatch_results.fits \
    --environment IDR \
    --idr-field DEEP

# Crossmatch an already-uploaded archive user table (no local upload needed)
euclidkit crossmatch \
    --user-table-name my_table \
    --output crossmatch_results.fits \
    --match-mode object-id \
    --environment IDR \
    --idr-field WIDE
```

`--max-sources` vs `--async-chunk-size`:

- `--max-sources`: limits how many rows from the input table are processed in total.
- `--async-chunk-size`: controls rows per async TAP job when `--full-async` is enabled.

### Uploading Tables

```bash
# Upload a FITS table to your Euclid TAP workspace
euclidkit upload-table \
    --input my_sources.fits \
    --table-name my_workspace_table \
    --description "Sources awaiting deep crossmatch" \
    --overwrite

# Upload CSV data as-is (format inferred automatically)
euclidkit upload-table \
    --input trimmed_sources.csv \
    --table-name trimmed_sources
```

### Querying Spectra

```bash
# Query spectra from crossmatch results
euclidkit query-spectra \
    --crossmatch crossmatch_results.fits \
    --output spectra_sources.fits \
    --environment IDR \
    --idr-field WIDE \
    --verbose

# Query spectra by object IDs and auto-combine
euclidkit query-spectra \
    --object-ids 123456,789012,345678 \
    --output spectra_sources.fits \
    --combine-output my_spectra.fits \
    --max-spectra 100 \
    --verbose
```

### Building Cutana Input

```bash
# Build Cutana CSV from a source table with object_id or ra/dec columns
euclidkit query-cutana \
    --sources my_sources.fits \
    --output cutana_input.csv \
    --instrument VIS \
    --cutout-size arcsec \
    --cutout-size-value 15

# NISP example with explicit filters
euclidkit query-cutana \
    --sources my_sources.fits \
    --output cutana_input_nisp.csv \
    --instrument NISP \
    --nisp-filters NIR_Y,NIR_H \
    --environment IDR \
    --idr-field DEEP \
    --cutout-size arcsec \
    --cutout-size-value 15
```

### Compiling Spectra

```bash
# Compile individual spectra into chunked FITS files
euclidkit compile-spectra \
    --spectra-table spectra_sources.fits \
    --output-dir ./output \
    --prefix compiled_spectra \
    --max-extensions 1000 \
    --verbose

# IDR DEEP canonical mode: choose arm(s) using XML LambdaRange
euclidkit compile-spectra \
    --spectra-table spectra_sources.fits \
    --output-dir ./output \
    --prefix compiled_deep \
    --environment IDR \
    --idr-field DEEP \
    -L BOTH

# Datalink mode: compile BOTH arms into separate _rgs / _bgs outputs
euclidkit compile-spectra \
    --spectra-table spectra_sources.fits \
    --output-dir ./output \
    --prefix compiled_dl \
    --use-datalink \
    --environment IDR \
    --schema sedm \
    -L BOTH
```

Note: for canonical compilation from local Datalabs FITS volumes, `--workers 2` is often not faster due to shared-storage I/O contention. Prefer `--workers 1` unless benchmarking on your setup shows a clear gain.
Note: `-L/--lambda-range` is the unified arm selector. In datalink mode, `RGS`/`BGS` map to corresponding retrieval types, and `BOTH` runs two passes and writes separate `_rgs` and `_bgs` files. `--retrieval-type` is kept for backward compatibility.

## Key Features

### Data Archive Integration

- **Multiple Environments**: Support for PDR, IDR, OTF, and REG archive environments
- **Efficient Queries**: Batch processing with TAP table uploads for large datasets
- **Crossmatching**: Position-based matching with configurable search radius

### Spectroscopic Tools

- **Spectrum Access**: Direct access to Euclid data volumes on ESA Datalabs
- **FITS Compilation**: Combine individual spectra into multi-extension FITS files
- **Metadata Preservation**: Maintain source IDs, coordinates, and provenance information

### Analysis Pipeline

- **Quality Control**: Spectrum validation and quality assessment

## Data Environment

### ESA Datalabs Integration

This package is optimized for the ESA Datalabs environment with direct access to:

- **Euclid Q1 Data**: `/data/euclid_q1/` (35 TB volume)

## API Reference

### Core Classes

#### `EuclidArchive`

Main interface to the Euclid science archive.

```python
archive = EuclidArchive(environment='PDR')
archive.login(credentials_file='~/.euclidkit/.cred.txt')

# Crossmatch sources
results = archive.crossmatch_sources(
    user_table="sources.csv",
    radius=1.0,
    output_file="results.fits"
)

# Query spectra
spectra = archive.query_spectra_sources(
    crossmatch_table=results,
    output_file="spectra.fits"
)

# Get individual spectrum
spectrum_hdu = archive.get_individual_spectrum(
    datalabs_path="/data/euclid_q1/path",
    file_name="spectrum_file.fits", 
    hdu_index=42
)

# Combine spectra
combined = archive.combine_spectra_to_fits(
    spectra_table=spectra,
    output_file="combined.fits",
    max_spectra=1000
)
```

#### `SpectrumCompiler`

Advanced spectrum compilation with chunking support.

```python
from euclidkit.core.spectra import SpectrumCompiler

compiler = SpectrumCompiler(max_extensions=1000)

# Compile into chunked files
output_files = compiler.compile_spectra(
    spectra_table=spectra_table,
    output_dir="./output",
    output_prefix="compiled_spectra"
)

# Create single FITS file
single_file = compiler.compile_single_fits(
    spectra_table=spectra_table,
    output_file="all_spectra.fits"
)

# Generate metadata table
metadata = compiler.create_metadata_table(
    spectra_table=spectra_table,
    output_files=output_files,
    output_dir="./output"
)
```

### Workflow Examples

#### Complete Spectroscopic Analysis Pipeline

```python
from euclidkit.core.data_access import EuclidArchive
from euclidkit.core.spectra import SpectrumCompiler
import pandas as pd

# 1. Initialize archive
archive = EuclidArchive(environment='PDR')
archive.login()

# 2. Load your QSO candidates
qso_candidates = pd.read_csv('qso_candidates.csv')

# 3. Crossmatch with Euclid MER catalogue
crossmatches = archive.crossmatch_sources(
    user_table=qso_candidates,
    radius=2.0,  # 2 arcsecond radius
    output_file='qso_crossmatches.fits'
)

# 4. Find available spectra
spectra_sources = archive.query_spectra_sources(
    crossmatch_table=crossmatches,
    output_file='qso_spectra_sources.fits'
)

print(f"Found {len(spectra_sources)} spectra for {len(crossmatches)} crossmatches")

# 5. Create combined FITS file (for small samples)
if len(spectra_sources) <= 1000:
    combined_spectra = archive.combine_spectra_to_fits(
        spectra_table=spectra_sources,
        output_file='qso_combined_spectra.fits'
    )
    print(f"Combined spectra saved to: {combined_spectra}")

# 6. Or use chunked compilation for large samples
else:
    compiler = SpectrumCompiler(max_extensions=2000)
    output_files = compiler.compile_spectra(
        spectra_table=spectra_sources,
        output_dir='./spectra_chunks',
        output_prefix='qso_spectra'
    )
    print(f"Created {len(output_files)} chunked files")

archive.logout()
```

## Diagnostics

Check your installation and environment:

```bash
# Check all components
euclidkit diagnostics

# Check specific components
euclidkit diagnostics --check-deps --check-data
```

## Archive Environments

Use ``--environment`` (CLI) or ``environment=...`` (Python API) to select the
archive backend:

- **PDR**: Public Data Release archive.
- **IDR**: Internal Data Release archive (consortium access).
- **OTF**: On-the-fly archive environment.
- **REG**: Regression/testing archive environment.

For **IDR**, you can also select the field with ``--idr-field``:

- **WIDE**: Uses the IDR WIDE MER catalogue.
- **DEEP**: Uses the IDR DEEP MER catalogue.

Examples:

```bash
# IDR WIDE (default IDR field)
euclidkit crossmatch \
  --input my_sources.fits \
  --output xmatch_wide.fits \
  --environment IDR \
  --idr-field WIDE

# IDR DEEP
euclidkit crossmatch \
  --input my_sources.fits \
  --output xmatch_deep.fits \
  --environment IDR \
  --idr-field DEEP
```

## Contributing

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

## Documentation

For detailed documentation and examples, visit:
- [euclidkit docs](https://euclidkit.readthedocs.io/en/latest/index.html)
- [Package Documentation](https://github.com/rudolffu/euclidkit/docs)
- [Euclid Science Archive](https://s2e2.cosmos.esa.int/www/euclid_iscience/Public_User_Guide.html)
- [astroquery.esa.euclid](https://astroquery.readthedocs.io/en/latest/esa/euclid/euclid.html)

## Support

- **Issues**: [GitHub Issues](https://github.com/rudolffu/euclidkit/issues)
- **Discussions**: [GitHub Discussions](https://github.com/rudolffu/euclidkit/discussions)
- **Email**: fuympku@outlook.com

## Author

**Yuming Fu** ([@rudolffu](https://github.com/rudolffu))
- Email: fuympku@outlook.com
- GitHub: https://github.com/rudolffu/euclidkit

## License

This project is licensed under the BSD 3-Clause License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- ESA Euclid Mission and Euclid Consortium
- ESA Datalabs and Euclid Data Space infrastructure team
- Astropy and astroquery communities

## Changelog

### Latest Changes

- **Spectroscopic Pipeline**: Complete pipeline for accessing and combining Euclid spectra
- **CLI Integration**: Added `--combine-output` option to `query-spectra` command
- **TAP Upload**: Improved query performance using TAP table uploads
- **FITS Compilation**: Efficient multi-extension FITS file creation
- **Error Handling**: Robust handling of long filenames and missing data

See [CHANGELOG.md](CHANGELOG.md) for detailed version history.
