Metadata-Version: 2.1
Name: ltsfit
Version: 6.0.2
Summary: LtsFit: Least Trimmed Squares Fitting
Home-page: https://purl.org/cappellari/software
Author: Michele Cappellari
Author-email: michele.cappellari@physics.ox.ac.uk
License: Other/Proprietary License
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Description-Content-Type: text/x-rst

The LtsFit Package
==================

**Robust Least Squares Regression with Uncertainties and Scatter in Any Dimension**

.. image:: https://img.shields.io/pypi/v/ltsfit.svg
    :target: https://pypi.org/project/ltsfit/
.. image:: https://img.shields.io/badge/arXiv-1208.3522-orange.svg
    :target: https://arxiv.org/abs/1208.3522
.. image:: https://img.shields.io/badge/DOI-10.1093/mnras/stt562-green.svg
    :target: https://doi.org/10.1093/mnras/stt562

LtsFit is a Python package for **very robust** hyperplane fitting in N dimensions,
with uncertainties in all coordinates and intrinsic scatter. It implements the
method described in Section 3.2 of
`Cappellari et al. (2013a) <https://ui.adsabs.harvard.edu/abs/2013MNRAS.432.1709C>`_
and uses the Least Trimmed Squares (LTS) technique to iteratively clip outliers
`(Rousseeuw & van Driessen 2006) <http://doi.org/10.1007/s10618-005-0024-4>`_.

.. contents:: :depth: 2

Attribution
-----------

Please also cite `Cappellari et al. (2013a) <https://ui.adsabs.harvard.edu/abs/2013MNRAS.432.1709C>`_
if you use this software for your research. This is the paper where the
implementation was described. The BibTeX entry for the paper is::

    @ARTICLE{Cappellari2013a,
        author = {{Cappellari}, M. and {Scott}, N. and {Alatalo}, K. and
            {Blitz}, L. and {Bois}, M. and {Bournaud}, F. and {Bureau}, M. and
            {Crocker}, A.~F. and {Davies}, R.~L. and {Davis}, T.~A. and {de Zeeuw},
            P.~T. and {Duc}, P.-A. and {Emsellem}, E. and {Khochfar}, S. and
            {Krajnovi{\'c}}, D. and {Kuntschner}, H. and {McDermid}, R.~M. and
            {Morganti}, R. and {Naab}, T. and {Oosterloo}, T. and {Sarzi}, M. and
            {Serra}, P. and {Weijmans}, A.-M. and {Young}, L.~M.},
        title = "{The ATLAS$^{3D}$ project - XV. Benchmark for early-type
            galaxies scaling relations from 260 dynamical models: mass-to-light
            ratio, dark matter, Fundamental Plane and Mass Plane}",
        journal = {MNRAS},
        eprint = {1208.3522},
        year = 2013,
        volume = 432,
        pages = {1709-1741},
        doi = {10.1093/mnras/stt562}
    }

Installation
------------

install with::

    pip install ltsfit

Without writing access to the global ``site-packages`` directory, use::

    pip install --user ltsfit

To upgrade the package to the latest version use::

    pip install --upgrade ltsfit

Documentation
-------------

See ``ltsfit/examples`` and the files docstrings.
They are copied by ``pip`` within the global folder
`site-packages <https://stackoverflow.com/a/46071447>`_.

###########################################################################

ltsfit
======

Purpose
-------

Fit a linear function of the form::

    y = a + b1*x1 + b2*x2 +...+ bm*xm,

to data with errors in all coordinates and intrinsic scatter, using a robust
method that clips outliers. The function can handle lines in 2-dim, planes in
3-dim, or hyperplanes in N-dim, where ``x1, x2,..., xm`` are the independent
variables and ``y`` is the dependent variable. The method was introduced in
Sec. 3.2 of `Cappellari et al. (2013a) <https://ui.adsabs.harvard.edu/abs/2013MNRAS.432.1709C>`_
and the treatment of outliers is is based on the FAST-LTS technique by
`Rousseeuw & van Driessen (2006) <http://doi.org/10.1007/s10618-005-0024-4>`_.
See also `Rousseeuw (1987) <http://books.google.co.uk/books?id=woaH_73s-MwC&pg=PA15>`_.

Calling Sequence
----------------

.. code-block:: python

    from ltsfit.ltsfit import ltsfit

    p = ltsfit(x, y, sigx, sigy, clip=2.6, corr=True, epsy=True,
               frac=None, label='Fitted', label_clip='Clipped',
               legend=True, pivot=None, plot=True, text=True)

    print(f"Best fitting parameters: {p.coeff}")

The output values are stored as attributes of the ``p`` object.

Input Parameters
----------------

x: array_like with shape (n, m)
    Array of ``n`` independent variables for ``m`` dimensions.

    EXAMPLE: To fit a line in 2-dim, one has a single vector ``x`` of
    length ``n`` with the independent variable and a corresponding vector of
    dependent variable ``y``.

    EXAMPLE: To fit a plane in 3-dim, one has two vectors of length ``n`` of
    independent variables ``(x1, x2)``. In this case,
    ``x = np.column_stack([x1, x2])``.

    EXAMPLE: To fit a hyperplane in 4-dim, one has three vectors of
    independent variables ``(x1, x2, x3)``. In this case,
    ``x = np.column_stack([x1, x2, x3])``.
y: array_like with shape (n,)
    Vector of measured values for each set of ``x`` variables.
sigx: array_like with shape (n, m)
    Array of ``1sigma`` uncertainties for each ``x`` coordinate for ``m``
    dimensions. This has the same shape as ``x``.
sigy: array_like with shape (n,)
    Vector of ``1sigma`` uncertainties for each ``y`` value.

Optional Keywords
-----------------

clip: float
    Clipping threshold in ``sigma`` units. Values deviating more than
    ``clip*sigma`` from the best fit are considered outliers and are
    excluded from the fit. Default is ``clip=2.6``, which would include
    99% of the values for a Gaussian distribution.
corr: bool
    if ``True``, the correlation coefficients are printed on the plot.
    Default is ``True``.
epsy: bool
    If ``True``, the intrinsic scatter is printed on the output plot.
    Default is ``True``.
frac: float
    Fraction of values to include in the LTS stage.
    Up to a fraction ``frac`` of the values can be outliers.
    One must have ``0.5 <= frac <= 1``. Default is ``0.5``.

    NOTE: Set ``frac=1`` to turn off outlier detection.
pivot: array_like with shape (m,)
    If nonzero, then ``ltsfit`` fits the following line, plane or hyperplane::

        y = a + b0*(x0 - pivot[0]) + b1*(x1 - pivot[1]) + ...

    ``pivot`` are called ``x_0``, ``y_0`` in eq.(7) of `Cappellari et al. (2013a)`_.
    Use of this keyword is strongly recommended, and suggested values are
    ``pivot = np.median(x, 0)``. This keyword has no effect on the best fit
    but is important to reduce the covariance and uncertainty in the
    intercept ``a``.  However, the covariance is weakly dependent on the
    precise value of the ``pivot``. For this reason, it is generally better
    to round the ``pivot`` values to nice numbers. Default is ``0``.
plot: bool
    If ``True``, a plot of the fit is produced. Default is ``True``.
text: bool
    If ``True``, the best fitting parameters are printed on the plot.
    Default is ``True``.

Output Parameters
-----------------

The output values are stored as attributes of the ``ltsfit`` class.

p.coef: array_like with shape (m+1,)
    Best fitting parameters ``[a, b1, b2,..., bm]``.
p.coef_err: array_like with shape (m+1,)
    ``1*sigma`` formal uncertainties ``[a_err, b1_err, b2_err,..., bm_err]``.
p.mask: array_like with shape (n,) and dtype bool
    Boolean vector indicating which elements of ``z`` were included in
    the fit (``True``) and which were clipped as outliers (``False``).
p.rms: float
    RMS deviation between the data and the fitted relation.
p.sig_int: float
    Intrinsic scatter in the ``y`` direction around the line/plane/hyperplane.
    ``sig_int`` is called ``epsilon_y`` in eq.(6) of `Cappellari et al. (2013a)`_.
p.sig_int_err: float
    ``1*sigma`` formal error on ``sig_int``.
p.xx: array_like with shape (n,)
    Values plotted along the x-axis. This is the linear combination of the
    ``x`` variables that represents the plane/hyperplane edge-on::

        xx = a + b1*(x1 - pivot[0]) + b2*(x2 - pivot[1]) + ...

    For line fitting, these are just the ``x`` values.
p.yy: array_like with shape (n,)
    The input ``y`` values plotted along the y-axis.
p.xerr: array_like with shape (n,)
    ``1*sigma`` uncertainties for ``p.xx`` in the x-axis of the plot.
p.yerr: array_like with shape (n,)
    ``1*sigma`` uncertainties for ``p.yy`` in the y-axis of the plot.
p.xline: array_like with shape (2,)
    ``x`` coordinates of the best fitting relation as shown on the plot.
p.yline: array_like with shape (2,)
    ``y`` coordinates of the best fitting relation as shown on the plot.
p.spearmanr: array_like with shape (2,)
    Spearman ``r`` coefficient and probability ``p`` between ``(p.xx, p.yy)``
    without clipping outliers.
p.pearsonr: array_like with shape (2,)
    Pearson ``r`` coefficient and probability ``p`` between ``(p.xx, p.yy)``
    without clipping outliers.

###########################################################################



License
=======

Other/Proprietary License

Copyright (c) 2012-2023 Michele Cappellari

This software is provided as is with no warranty. You may use it for
non-commercial purposes and modify it for personal or internal use, as long
as you include this copyright and disclaimer in all copies. You may not
redistribute the code.

###########################################################################

Changelog
=========

V6.0.1: MC, Oxford, 20 July 2023
  - New function ``ltsfit`` to fit hyperplanes in N-dim. This procedure
    generalizes and replaces both ``lts_linefit`` and ``lts_planefit``, which
    are now deprecated wrappers for ``ltsfit``. This change was suggested and
    motivated by Francesco D'Eugenio (cam.ac.uk), who shared his own 4-dim
    ``lts_hyperfit`` and his paper on a useful application.
  - ``ltsfit``: When fitting planes/hyperplanes, plot the independent variable
    on the y-axis to be consistent with line fitting. Also plot a legend.
  - Updated all ``ltsfit_examples``.
  - Fixed inconsistency between the version number on PyPi and in the Changelog.

V5.0.20: MC, Oxford, 3 October 2022
  - Fixed program stop due to Matplotlib change.
    Thanks to Hitesh Lala (Heidelberg) for the report.
  - Extract documentation from docstrings and show it on PyPi.

V5.0.19: MC, Oxford, 22 January 2021
  - Fixed incorrect plot ranges due to a Matplotlib change.
    Thanks to Davide Bevacqua (unibo.it) for the report.

V5.0.18: MC, Oxford, 17 February 2020
  - Properly print significant trailing zeros in results.

V5.0.17: MC, Oxford, 22 January 2020
  - Formatted documentation as docstring.
  - Included p.rms output.
  - Published on PyPi. Increased major version number by mistake.

V2.0.16: MC, Oxford, 27 September 2018
  - Fixed clock DeprecationWarning in Python 3.7.

V2.0.15: MC, Oxford, 12 May 2018
  - Dropped Python 2.7 support.

V2.0.14: MC, Oxford, 13 April 2018
  - Fixed FutureWarning in Numpy 1.14.

V2.0.13: Michele Cappellari, Oxford, 26 July 2017
  - Increased upper limit of intrinsic scatter accounting for
    uncertainty of standard deviation with small samples.

V2.0.12: MC, Oxford, 5 September 2016
  - Fixed: store ab errors in p.ab_err as documented.
    Thanks to Alison Crocker for the correction.

V2.0.11: MC, Oxford, 4 July 2016
  - Added capsize=0 in plt.errorbar to prevent error bar caps
    from showing up in PDF.

V2.0.10: MC, Oxford, 23 January 2016
  - Check for non finite values in input.

V2.0.9: MC, Oxford, 8 January 2016
  - Use LimeGreen for outliers.

V2.0.8: MC, Oxford, 9 December 2015
  - Fixed potential program stop without outliers in Matplotlib 1.5.
  - Increased maximum intrinsic scatter for brentq, to avoid possible
    stops in extreme situations.

V2.0.7: MC, Oxford, 1 October 2015
  - Fixed potential program stop without outliers.

V2.0.6: MC, Oxford, 5 September 2015
  - Optionally pass a legend label.

V2.0.5: MC, Oxford, 6 July 2015
  - Fixed potential program stop without outliers.
    Thanks to Masato Onodera for a clear report of the problem.
  - Output boolean mask instead of good/bad indices.
  - Removed lts_linefit_example from this file.
  - Print verbose output during calculation.

V2.0.4: MC, Baltimore, 9 June 2015
  - Updated documentation.

V2.0.3: MC, Oxford, 10 December 2014
  - Uses np.std rather than biweight to estimate scatter upper limit.

V2.0.2: MC, 6 November 2014
  - Included _linefit function to avoid np.polyfit bug with weights.

V2.0.1: MC, Oxford, 23 October 2014
  - Fixed program stop with zero scatter.

V2.0.0: MC, Portsmouth, 23 June 2014
  - Converted from IDL into Python.

V1.0.6: MC, Baltimore, 8 June 2014
  - Check that all input vectors have the same size.

V1.0.5: MC, Oxford, 19 September 2013
  - Scale line spacing with character size in text output.

V1.0.4: MC, Turku, 10 July 2013
  - Fixed program stop affecting earlier versions of IDL.
    Thanks to Xue-Guang Zhang for reporting the problem
    and the solution.

V1.0.3: MC, Oxford, 13 March 2013
  - Added CLIP keyword.

V1.0.2: MC, Oxford, 1 August 2011
  - Added PIVOT keyword.

V1.0.1: MC, Oxford, 28 July 2011
  - Included _EXTRA and OVEPLOT, keywords.

V1.0.0: Michele Cappellari, Oxford, 21 March 2011
  - Created and tested.
