Metadata-Version: 2.4
Name: sgnl-cpu-interp
Version: 0.1.0
Summary: Fast polyphase resampling with multi-architecture SIMD support
Author-email: Chad Hanna <crh184@psu.edu>
License-Expression: MIT
Project-URL: Homepage, https://git.ligo.org/greg/fast-resample-cpu
Project-URL: Documentation, https://greg.docs.ligo.org/fast-resample-cpu
Project-URL: Issues, https://git.ligo.org/greg/fast-resample-cpu/issues
Keywords: signal processing,interpolation,upsampling,polyphase,resampling,SIMD
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: C
Classifier: Topic :: Scientific/Engineering
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.19.0
Provides-Extra: test
Requires-Dist: pytest; extra == "test"
Requires-Dist: pytest-cov; extra == "test"
Provides-Extra: lint
Requires-Dist: black; extra == "lint"
Requires-Dist: flake8; extra == "lint"
Requires-Dist: flake8-bandit; extra == "lint"
Requires-Dist: flake8-black; extra == "lint"
Requires-Dist: flake8-bugbear; extra == "lint"
Requires-Dist: flake8-isort; extra == "lint"
Requires-Dist: flake8-pyproject; extra == "lint"
Requires-Dist: isort; extra == "lint"
Requires-Dist: mypy; extra == "lint"
Requires-Dist: mypy-extensions; extra == "lint"
Requires-Dist: typing_extensions; extra == "lint"
Provides-Extra: dev
Requires-Dist: sgnl-cpu-interp[lint]; extra == "dev"
Requires-Dist: sgnl-cpu-interp[test]; extra == "dev"
Dynamic: license-file

# sgnl-cpu-interp

Fast polyphase resampling for multichannel data with multi-architecture SIMD support.

## Features

- **Multi-Architecture SIMD**: Automatic runtime CPU detection with optimized kernels for:
  - x86_64: AVX-512, AVX2+FMA, AVX, SSE4.1, SSE2
  - ARM64: NEON (Apple Silicon, AWS Graviton, etc.)
  - Fallback: Optimized scalar implementation
- **No External Dependencies**: Only requires NumPy (removed GSL and FFTW dependencies)
- **High Performance**: ~5x faster than GSL-based implementations
- **Multichannel**: Optimized for processing many channels simultaneously (tested with 1024+ channels)
- **Quality**: Lanczos-windowed sinc interpolation for high-quality upsampling
- **Two Memory Layouts**: Supports both (time, channels) and (channels, time) layouts
- **Simple API**: Easy-to-use NumPy-based interface

## Installation

### From PyPI (when available)

```bash
pip install sgnl-cpu-interp
```

### From source

```bash
git clone https://github.com/yourusername/sgnl-cpu-interp.git
cd sgnl-cpu-interp
pip install .
```

No external dependencies are required beyond NumPy. The build system automatically detects your CPU architecture and compiles the appropriate SIMD kernels.

## Quick Start

```python
import numpy as np
from sgnl_cpu_interp import upsample, get_simd_info

# Check which SIMD implementation is being used
print(get_simd_info())
# {'implementation': 'NEON', 'available': ['NEON', 'Scalar'], 'cpu_features': 'NEON+FMA'}

# Upsample a 50 Hz sine wave from 128 Hz to 2048 Hz
fs_in = 128
fs_out = 2048
factor = fs_out // fs_in  # 16x upsampling

# Generate test signal
t = np.arange(0, 0.5, 1/fs_in)
signal = np.sin(2 * np.pi * 50 * t).astype(np.float32)

# Upsample
upsampled = upsample(signal, factor=factor, half_length=8)
print(f"Input: {len(signal)} samples at {fs_in} Hz")
print(f"Output: {len(upsampled)} samples at {fs_out} Hz")
```

## Usage Examples

### Single channel upsampling
```python
import numpy as np
from sgnl_cpu_interp import upsample

# 1D signal (single channel)
signal = np.random.randn(1024).astype(np.float32)
upsampled = upsample(signal, factor=2)
```

### Multichannel upsampling
```python
# 2D array: (n_samples, n_channels)
n_samples, n_channels = 1024, 128
data = np.random.randn(n_samples, n_channels).astype(np.float32)

# Upsample by factor of 2
upsampled = upsample(data, factor=2)
print(upsampled.shape)  # (2016, 128) - note: loses 2*half_length samples

# Upsample by factor of 4 with longer kernel for better quality
upsampled = upsample(data, factor=4, half_length=16)
print(upsampled.shape)  # (3972, 128)
```

### Transposed layout
```python
from sgnl_cpu_interp import upsample_transposed

# Transposed layout: (n_channels, n_samples)
data = np.random.randn(128, 1024).astype(np.float32)
upsampled = upsample_transposed(data, factor=2)
print(upsampled.shape)  # (128, 2016)
```

## API Reference

### `upsample(data, factor=2, half_length=8)`

Upsample multichannel data using polyphase filtering (standard layout).

**Parameters:**
- `data` (ndarray): Input array of shape `(n_samples,)` for single channel or `(n_samples, n_channels)` for multichannel. Will be converted to float32 if necessary.
- `factor` (int, optional): Upsampling factor (default: 2). Must be >= 2.
- `half_length` (int, optional): Half-length of the sinc kernel (default: 8). Larger values provide better quality but are slower. Total kernel length = `2 * half_length + 1`.

**Returns:**
- `output` (ndarray): Upsampled array of shape `((n_samples - kernel_len + 1) * factor,)` or `((n_samples - kernel_len + 1) * factor, n_channels)` where `kernel_len = 2 * half_length + 1`.

### `upsample_transposed(data, factor=2, half_length=8)`

Upsample multichannel data using polyphase filtering (transposed layout).

Same as `upsample()` but expects input in `(n_channels, n_samples)` layout. Use this when your data is already in channels-first format to avoid transpose overhead.

**Parameters:**
- `data` (ndarray): Input array of shape `(n_channels, n_samples)`. Must be 2D and float32.
- `factor` (int, optional): Upsampling factor (default: 2). Must be >= 2.
- `half_length` (int, optional): Half-length of the sinc kernel (default: 8).

**Returns:**
- `output` (ndarray): Upsampled array of shape `(n_channels, (n_samples - kernel_len + 1) * factor)`.

### `get_simd_info()`

Get information about the current SIMD implementation.

**Returns:**
- `dict` with keys:
  - `implementation`: Name of current implementation (e.g., 'AVX2+FMA', 'NEON', 'Scalar')
  - `available`: List of all available implementations for this CPU
  - `cpu_features`: Detected CPU SIMD features

### `set_implementation(name)`

Manually select a SIMD implementation. Useful for testing and benchmarking.

**Parameters:**
- `name` (str): Implementation name from `get_simd_info()['available']`

Can also be set via the `SGNL_CPU_IMPL` environment variable:
```bash
SGNL_CPU_IMPL=Scalar python my_script.py
```

## Important Notes

- **Edge loss**: The convolution loses `kernel_len - 1` samples from the edges. For `half_length=8`, you lose 16 input samples.
- **Time alignment**: The output has a delay of `(kernel_len - 1) / 2` samples at the input sample rate.
- **Minimum length**: Input must have at least `kernel_len` samples.
- Uses Lanczos-windowed sinc kernel: `h(x) = sinc(x/factor) * sinc(x/kernel_length)`

## Performance

Benchmark on 1024 channels, 1024 samples:

| Platform | Implementation | Time | Notes |
|----------|---------------|------|-------|
| Apple Silicon (M-series) | NEON | ~0.4 ms | Auto-selected |
| Apple Silicon | Scalar | ~0.4 ms | Compiler auto-vectorizes well |
| x86_64 (Haswell+) | AVX2+FMA | ~0.3 ms | Expected |
| x86_64 (older) | SSE2 | ~0.8 ms | Baseline x86_64 |

Comparison with previous GSL-based implementation:

| Implementation | Time | Speedup |
|----------------|------|---------|
| GSL BLAS (old) | 9.8 ms | 1.0x |
| This package | 0.4 ms | **~25x** |

## Architecture

The package automatically detects CPU features at module load time and selects the best available implementation:

```
┌─────────────────────────────────────────────────────────┐
│                    Python API                           │
│         upsample() / upsample_transposed()              │
└─────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────┐
│                  Runtime Dispatch                        │
│         cpu_detect() → select best implementation        │
└─────────────────────────────────────────────────────────┘
                           │
          ┌────────────────┼────────────────┐
          ▼                ▼                ▼
    ┌──────────┐    ┌──────────┐    ┌──────────┐
    │ AVX-512  │    │   NEON   │    │  Scalar  │
    │  AVX2    │    │  (ARM)   │    │(fallback)│
    │   AVX    │    └──────────┘    └──────────┘
    │  SSE4.1  │
    │  SSE2    │
    │  (x86)   │
    └──────────┘
```

## Development

### Building from source

```bash
# Install in development mode
pip install -e .

# Run tests
pytest tests/ -v
```

### Project structure

```
sgnl-cpu-interp/
├── src/
│   ├── cpu_detect.c      # Runtime CPU feature detection
│   ├── dispatch.c        # Function pointer dispatch table
│   ├── resample_ext_simd.c  # Python extension wrapper
│   └── kernels/
│       ├── convolve_scalar.c   # Baseline implementation
│       ├── convolve_sse2.c     # x86 SSE2
│       ├── convolve_sse4.c     # x86 SSE4.1
│       ├── convolve_avx.c      # x86 AVX
│       ├── convolve_avx2.c     # x86 AVX2+FMA
│       ├── convolve_avx512.c   # x86 AVX-512
│       └── convolve_neon.c     # ARM NEON
├── sgnl_cpu_interp.py    # Python API
├── setup.py              # Build configuration
└── tests/
    └── test_simd.py      # Test suite
```

### Adding a new SIMD implementation

1. Create `src/kernels/convolve_<name>.c` implementing `convolve_<name>()` and `convolve_transposed_<name>()`
2. Add the implementation to the dispatch table in `src/dispatch.c`
3. Add build flags to `setup.py` in `SIMD_FLAGS_UNIX` / `SIMD_FLAGS_MSVC`
4. Add CPU feature detection if needed in `src/cpu_detect.c`

## Algorithm

This implementation uses polyphase filtering for efficient upsampling:

1. **Kernel generation**: Creates a Lanczos-windowed sinc kernel and splits it into `factor` polyphase components
2. **SIMD convolution**: Vectorized dot product across channels (standard layout) or time samples (transposed layout)
3. **Phase-blocked upsampling**: Processes all output samples with the same phase together to maximize kernel data reuse in cache

The approach is specifically optimized for:
- Many channels (100+)
- Small to moderate upsampling factors (2-16x)
- Short to medium input lengths (100s to 1000s of samples)

## License

MIT License - see LICENSE file for details.

## Contributing

Contributions welcome! Please open an issue or pull request on GitHub.

## Citation

If you use this in research, please cite:
```
@software{sgnl_cpu_interp,
  title = {sgnl-cpu-interp: Fast polyphase resampling with multi-architecture SIMD},
  url = {https://github.com/yourusername/sgnl-cpu-interp},
  year = {2025}
}
```
