# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

> The *Unreleased* section is for changes that are not yet released, but are going to be released in the next version.

## [0.5.0] - 2026-03-27

### Added

- CUDA kernel implementations for biquad, SOS cascade, and delay line filters using parallel prefix scan algorithm
- JIT-compiled C++/CUDA native extension (`torchfx._ops`) with automatic fallback to pure-PyTorch
- CPU-only C++ extension support: native kernels now compile and load without CUDA toolkit, providing ~2400x speedup for stateful IIR filtering on CPU
- `LogFilterBank` for logarithmically-spaced frequency band decomposition
- CUDA kernel tests and fallback behavior tests
- FFT-based 1D convolution (`fft_conv1d`) adapted from [Julius](https://github.com/adefossez/julius) (MIT License) for fast FIR filtering using the overlap-save method
- `conv_mode` parameter on `FIR` and `DesignableFIR` filters: `"fft"` (default), `"direct"`, or `"auto"`
- Benchmark suite for FFT vs direct convolution across kernel sizes (64–1024) and signal durations

### Changed

- Benchmarks migrated from standalone scripts (`benchmark/`) to pytest-benchmark suite (`benchmarks/`) with unified structure, numba CUDA baselines for fair GPU comparison, and `--benchmark-disable` by default
- Stateful biquad and IIR SOS fallback paths replaced sample-by-sample Python loops with vectorized `lfilter`-based zero-state/zero-input decomposition (~100-500x faster when C++ extension is unavailable)
- IIR SOS matrix (`_compute_sos`) is now computed eagerly after `compute_coefficients()` instead of lazily in the forward path
- State tensor device transfers in stateful filter paths are now guarded to avoid redundant `.to(device)` calls
- Removed synchronous CUDA calls from native kernels for improved GPU throughput
- Added `setuptools` as a runtime dependency (required by `torch.utils.cpp_extension` for JIT compilation)
- FIR filters now default to FFT convolution, up to 10x faster for kernel sizes ≥ 64
- CUDA parallel scan replaced Hillis-Steele algorithm with work-efficient Blelloch scan, reducing total work from O(N log N) to O(N) and shared memory usage from 48 KB to 24 KB per block
- Sequential CUDA biquad kernel now batches 128 channels per thread block instead of 1, improving GPU occupancy for short signals
- Eliminated GPU→CPU synchronization in CUDA biquad kernel by passing all biquad coefficients (`b0`, `b1`, `b2`, `a1`, `a2`) as scalar arguments instead of extracting them from device tensors
- Pre-computed SOS convolution kernels and cached constant tensors (`b_delta`) in Python fallback paths to avoid per-call tensor allocation
- C++ CPU extension now compiles with `-O3 -ffast-math -march=native` and OpenMP parallelization, achieving ~2x faster than scipy for multi-channel IIR filtering
- `TORCHFX_NO_CUDA` environment variable added to force CPU-only extension compilation

### Fixed

- Native C++ extension was unreachable on CPU-only machines due to `torch.cuda.is_available()` gate in `_ops.py`
- Segfault in CUDA biquad kernel caused by dereferencing device pointer on host when reading b-coefficients; fixed by passing b0/b1/b2 as CPU scalars from the SOS copy
- Various mypy and ruff errors



## [0.4.0] - 2026-02-15

### Added

- Input validation layer with custom exception hierarchy (`torchfx.validation` module)
  - Base exception `TorchFXError` for catching all library errors
  - Specific exceptions: `InvalidParameterError`, `InvalidSampleRateError`, `InvalidRangeError`, `InvalidShapeError`, `InvalidTypeError`, `AudioProcessingError`, `CoefficientComputationError`, `FilterInstabilityError`
  - Validator functions for sample rates, parameter ranges, tensor shapes, types, and audio-specific validation (cutoff frequency, filter order, Q factor)
- Logging infrastructure (`torchfx.logging` module)
  - Structured logging following Python best practices (NullHandler by default)
  - Convenience functions: `enable_logging()`, `enable_debug_logging()`, `disable_logging()`, `get_logger()`
  - Performance profiling: `log_performance()` context manager and `LogPerformance` decorator
  - Hierarchical logger support for fine-grained control
- Realtime processing module (`torchfx.realtime`) with streaming processors and audio backends
- Biquad filter implementations (LPF, HPF, BPF, BPF peak, notch, all-pass) with stateful and stateless processing paths
- CLI application (Epic 3 — Phase 1) with Typer framework
  - `torchfx process` command: single-file, batch (glob + progress bar), and Unix pipe processing
  - `torchfx info` command: display audio file metadata in a Rich table
  - `torchfx play` command: play audio through speakers with optional effects (requires `sounddevice`)
  - `torchfx record` command: record from microphone with duration/sample-rate/channels control
  - Effect-chain parser: `--effect "name:param1=val1,param2=val2"` syntax with 30+ registered effects/filters
  - TOML configuration file support for defining reusable effect chains
  - GPU acceleration via `--device cuda` global option

## [0.3.0] - 2026-01-12

### Added

- method `Wave.save` to save audio files with custom format, encoding and bits per sample
- LoShelving filter implementation based on Audio EQ Cookbook
- ParametricEQ
- Elliptic filters (HiElliptic, LoElliptic)
- deprecation logic to improve code maintainability and backward compatibility
- migration guide to help users transition between versions
- style guidelines for contributors to maintain code quality and consistency
- documentation blog section for project updates and announcements
- contribution guidelines to facilitate community involvement

### Changed

- type hint `BitRate` to include 8 bits per sample option
- uniform q naming across all filters (changed from Q to q), this is a breaking change!
- documentation theme to pydata-sphinx-theme for better readability and navigation

### Fixed

- `Wave.merge` had a bug due to incorrect tensor concatenation along the channel dimension

## [0.2.1] - 2025-12-13

### Added

- `Delay` effect with BPM synchronization option by @itsuzef, with many delay strategies available
- new examples in the `examples/` folder:
    - `delay.py` showcasing the new Delay effect
- citation file `CITATION.cff` for easy referencing of the library in academic works

### Changed

- the documentation to include the new Delay effect and update existing examples
- the github workflow to run checks in parallel jobs for faster feedback

### Fixed

- pre-commit configuration to properly run `mypy`, `docformatter` and `black`
- fix many type hints across the codebase

## [0.2.0] - 2025-09-04

### Added

- CustomNormalizationStrategy class to allow custom normalization functions
- ability to pass a callable as strategy to the Normalize effect
- `ParallelFilterCombination` to combine a set of filters in parallel
- add `torch.no_grad` decorators where possible to increase performance

### Changed

- change `effects` module name to `effect` to be consistent with `filter` module name

## [0.1.2] - 2025-06-30

### Added

- third-party acknowledgments section in README and LICENSE files
- effects tests
- reverb effect

## [0.1.1] - 2025-06-16

### Added

- filters:
    - LinkwitzRiley
    - HiLinkwitzRiley
    - LoLinkwitzRiley
- effects:
    - Normalization
    - Gain
- merge method for Wave class

### Fixed

- Shelving and Peaking filters now work as expected, they were missing some instance variables

## [0.1.0] - 2025-04-20

### Added
- sphinx documentation

### Changed
- old parameters in the benchmark script

## [0.1.0rc] - 2025-04-14

### Added

- documentation
- filters
- wave class
- torch support
