Introduction
============

Preface
*******

This package has been implemented by Massimo Pierini as a Bachelor's thesis :cite:p:`mythesis`.

It is the first implementation of CUB class models in Python and is mainly based upon 
the work of Domenico Piccolo and the ``CUB`` package in R :cite:p:`iannario2022package`, 
mainteined by Rosaria Simone.

Background
**********

The class of CUB (Combination of Uniform and Binomial) models, proposed by Professor Domenico Piccolo in 
2003 :cite:p:`piccolo2003moments` within the cotext of rating and preference data analysis, hypothesizes that 
the ordinal responses provided by the raters are not the simple result of a reasoned choice, but rather the
complex combination of a multitude of factors, both internal and external.  

Simplifying, two main components can be distinguished:
*feeling* and *uncertainty*.  

The primary component of feeling
is due to sufficient awareness and understanding of the topic based on
knowledge and experience.  
The secondary component of uncertainty is instead generated by an *intrinsic fuzziness*, due to 
various circumstances: limited knowledge, lack of interest, timing of the survey, method of 
administration, boredom, and so on. 

The simplest way to consider these 
two aspects is a distribution resulting from a
mixture of a shifted Binomial component for the first and Uniform Discrete for the
second which takes the form of the CUB family models, subsequently 
extended to consider further factors such as the overdispersion of
Binomial component, the effect of shelter choice, and so on.

The most updated paper by :cite:alp:`piccolo2019class` will be used as a reference for
terminology, theory and inferential issues.

Motivation
**********

Currently the class of CUB models has been implemented in statistical and econometric programming languages 
such as R :cite:p:`iannario2022package`, Stata :cite:p:`cerulli2021stata`, 
Gretl :cite:p:`simone2019cub` 
and GAUSS :cite:p:`piccolo2006observed`.  However, given the recent increase in the development 
of the Python programming language also in the statistical field :cite:p:`pittard2020essential`, 
their implementation in
this environment could be useful to the scientific community.

Notes
*****

To simplify the notation, the complete matrix of the covariates will be occasionally 
indicated by :math:`\pmb T` and the column vector of model's parameters by :math:`\pmb\theta`.

Generally speaking, for models with covariates three different probability functions are available:

1. ``.pmfi()`` (probability distribution matrix)
    .. math::
        \Pr(R_i=r|\pmb\theta; \pmb T_i),
        \left\{
        \begin{array}{l}
        i=1,\ldots,n
        \\
        r=1,\ldots,m
        \end{array}
        \right.

    which is a matrix :math:`n \times m` of the probability distribution for each :math:`i`-th subject
    given the estimated parameters and the covariates. This is an auxiliary function
    for ``.draw()``. Notice that each row sums to 1, i.e. 

    .. math::
        \sum_{r=1}^m \Pr(R_i=r|\pmb\theta; \pmb T_i) = 1,\; \forall i

2. ``.pmf()`` (average probability distribution)
    .. math::
        \frac{1}{n} \sum_{i=1}^n \Pr(R_i=r|\pmb\theta; \pmb T_i),\; r=1,\ldots,m

    which is a row vector :math:`1 \times m` of the average probability given the
    estimated parameters and the covariates. This is an auxiliary function
    of ``.plot_ordinal()`` and used to compute the Dissimilarity index for models
    with covariates. Notice that it always sums to 1 because 

    .. math::
        \begin{align*}
        \sum_{r=1}^m \frac{1}{n} \sum_{i=1}^n \Pr(R_i=r|\pmb\theta; \pmb T_i)
        &= \frac{1}{n} \sum_{i=1}^n \; \sum_{r=1}^m \Pr(R_i=r|\pmb\theta; \pmb T_i)
        \\&= \frac{1}{n} \sum_{i=1}^n 1 = \frac{1}{n} n = 1
        \end{align*}

3. ``.prob()`` (observed sample probability)
    .. math::
        \Pr(R_i=r_i|\pmb\theta;\pmb T_i),\; i=1,\ldots,n

    which is a column vector :math:`n \times 1` of the probabilities for each :math:`i`-th subject
    of the observed response :math:`r_i` given the estimated parameters and the covariates.
    This has not been implemented for all models and can be an auxiliary function 
    of ``.loglik()``. Notice that usually it doesn't sum to 1.

|
|
|

.. .. bibliography:: cub.bib
..     :cited:
