0.16.0:
- general:
  - the GIL is now released in many more functions

-fft:
  - very long 1D transforms now have a lower memory overhead and should be
    faster

-sht:
  - a_lm rotation is now much more accurate, but slightly slower
  - the improved apherical harmonic analysis capabilities are now documented

-misc:
  - two new convenience functions vdot() and l2error() were added


0.15.0:
- general:
  - the code is now compiled with the "-fvisibility=hidden" flag, which reduces
    the size of the resulting binary.
  - demo codes were adjusted to use the new SHT interface.

- fft:
  - added some functions to reduce the amount of unnecessary memory
    allocations and data copying.

- sht:
  - it is no longer necessary to pre-allocate an array for the output of the
    `sht.experimental.*2d*` functions. If not provided, the functions will
    create the array automatically now, which requires passing of new `ntheta`
    and `nphi` parameters in come cases.
  - the `sht.experimental.*2d*` functions now take an optional `mmax` parameter
    which can be used to limit the maximum azimuthal moment of a transform.
    If not supplied, the code assumes that `mmax` is equal to `lmax`.
  - added some unit tests for the new SHT interface.
  - reduced memory overhead forsome of the `sht.experimental.*2d*` functions.

- misc:
  - added functionality (originally from Planck Level-S) to simulate time
    streams of detector noise.


0.14.0:
- general:
  - ducc0.__version__ is now also defined under Windows

- sht:
  - further performance improvements
  - added functions for manipulation of "2D maps", i.e. maps consisting of
    (ntheta*nphi) pixels with equidistant pixels in phi, and rings distributed
    along theta according to one of the CC, DH, F1, F2, GL, MW, MWflip schemes.

- totalconvolve:
  - bug fix in the adjoint convolution: results were inadvertently conjugated


0.13.0:
- general:
  - more comprehensive references in README.md

- sht:
  - bug fixes
  - tweaks to the experimental interface for extracting moments up to lmax
    from maps with only lmax+1 or lmax+2 equidistant rings. 


0.12.0:
- general:
  - update installation instructions in README.md

- sht:
  - expose functionality for computing gradient maps from spherical harmonic
    coefficients


0.11.0:
- general:
  - beginning of Doxygen documentation for the C++ part
  - fixes to the #include statements in header files; now every header can be
    included in isolation.
  - some CI streamlining


0.10.0:
- general:
  - HTML documentation generation using Sphinx
    Up-to-date documentation for the ducc0 branch is available at
    https://mtr.pages.mpcdf.de/ducc/.
  - more and improved docstrings
  - SIMD datatypes are now much more compatible with C++ upcoming SIMD types.
    The code can be compiled with the types from <experimental/simd> if
    available, with very small manual changes.
  - reshuffling and renaming of files

- fft:
  - 1D transforms have been rewritten using a much more flexible class hierarchy
    which allows more optimizations. For example 1D FFTs can now be partially
    multi-threaded and the Bluestein algorithm can be used as a single pass
    instead of just replacing a whole transform.

- sht:
  - design of a new SHT interface. Parts of this interface are made visible
    from Python, in the "sht.experimental" submodule. The "sharpjob_d"-based
    interface will be kept for compatibility purposes until ducc1 is released.
  - experimental support for spherical harmonic analysis that only requires
    lmax+1 or lmax+2 equidistant rings for exact analysis up to lmax.
  - misc.rotate_alm was moved to the sht submodule.

- totalconvolver:
  - interface change to synchronize it better with the upcoming SHT interface.
    Basically, if an array has a "number of components" axis, this is now
    always in first place.
    Strictly speaking this is an interface-breaking change, but to the best of
    my knowledge the interface in question has not been used in other projects
    yet.


0.9.0:
- general:
  - improved and faster computation of Gauss-Legendre nodes and weights
    using Ignace Bogaert's implementation (https://doi.org/10.1137/140954969,
    https://sourceforge.net/projects/fastgausslegendrequadrature/)
  - Intel OneAPI compilers are now supported
  - new accepted value "none-debug" for DUCC0_OPTIMIZATION

- wgridder:
  - fixed a bug which could cause memory accesses beyond the end of an array

- fft:
  - slightly improved buffer re-use

- misc:
  - substantially faster a_lm rotation code based on the Mikael Slevinsky's
    FastTransforms package (https://github.com/MikaelSlevinsky/FastTransforms)


0.8.0:
- general:
  - compiles and runs on MacOS 11
  - choice of various optimization and debugging levels by setting
    the DUCC0_OPTIMIZATION variable before compilation.
    Valid choices are
    "none":
      no optimization or debugging, fast compilation
    "portable":
      Optimizations which are portable to all CPUs of a given family
    "portable-debug":
      same as above, with debugging information
    "native":
      Optimizations which are specific to the host CPU, non-portable library
    "native-debug":
      same as above, with debugging information
    Default is "native".

- wgridder:
  - more careful treatment of u,v,w-coordinates and phase angles, leading to
    better achievable accuracies for single-precision runs
  - performance improvements by making the computed interval in "n-1" symmetric
    around 0. This reduces the number of required w planes significantly.
    Speedups are bigger for large FOVs and when FFT is dominating.
  - allow working with dirty images that are shifted with respect to the phase
    center. This can be used for faceting and incorporating DDEs.
  - new optional flag "double_precision_accumulation" for gridding routines,
    which causes accumulation onto the uv grid to be done in double precision,
    regardless of input and output precision. This can be helpful to avoid
    accumulation errors in special circumstances.

- pointingprovider:
  - improved performance via vectorized trigonometric functions


0.7.0:
- general:
  - compilation with MSVC on Windows is now possible

- wgridder:
  - performance (especially scaling) improvements
  - oversampling factors up to 2.5 supported
  - new, more flexible interface in submodule `wgridder.experimental`
    (subject to further changes!)

- totalconvolver:
  - now performs non-equidistant FFT interpolation also in psi direction,
    making it much faster for large kmax.
  - new low-level interface which allows flexible re-distribution of work
    over MPI tasks (responsibility of the caller)


0.6.0:
- general:
  - multi-threading improvements contributed by Peter Bell

- wgridder:
  - new, smaller internal data structure


0.5.0:
- wgridder:
  - internally used grid size is now chosen automatically, and the parameters
    "nu" and "nv" are ignored; they will be removed in ducc1.


0.3.0:
- general:
  - The package should now be installable from PyPI via pip even on MacOS.
    However, MacOS >= 10.14 is required.

- wgridder:
  - very substantial performance and scaling improvements


0.2.0:
- wgridder:
  - kernels are now evaluated via polynomial approximation, allowing much
    more freedom in the choice of kernel function
  - switch to 2-parameter ES kernels for better accuracy
  - unnecessary FFT calculations are skipped

- totalconvolve:
  - improved accuracy by making use of the new wgridder kernels
  - *INTERFACE CHANGE* removed method "epsilon_guess()"

- pointingprovider:
  new, experimental module for computing detector pointings from a time stream
  of satellite pointings. To be used by litebird_sim initially.
