
Formulas and Mathematical Detail
================================

Notation
--------

Interest is in recovering the parameter vector from the model

.. math::

   \begin{aligned}
   y_{i} & =\beta^{\prime}x_{i}+\epsilon_{i}\end{aligned}

where :math:`\beta` is :math:`k` by 1, :math:`x_{i}` is a :math:`k` by 1
vector of observable variables and :math:`\epsilon_{i}` is a scalar
error. :math:`x_{i}` can be separated in two types of variables. The
:math:`k_{1}` by 1 set of variables :math:`x_{1i}` are exogenous
regressors in the sense that :math:`E\left[x_{1i}\epsilon_{i}\right]=0`.
The :math:`k_{2}` by 1 variables :math:`x_{2i}` are endogenous. A set of
:math:`p` instruments is available that satisfy the requirements for
validity where :math:`p\geq k_{2}`. The extended model can be written as

.. math::

   \begin{aligned}
   y_{i} & =\underset{\textrm{exogenous}}{\underbrace{\beta_{1}^{\prime}x_{1i}}}+\underset{\textrm{endogenous}}{\underbrace{\beta_{2}^{\prime}x_{2i}}}+\epsilon_{i}\\
   x_{2i} & =\underset{\textrm{exogenous}}{\underbrace{\gamma_{1}^{\prime}z_{1i}}}+\underset{\textrm{instruments}}{\underbrace{\gamma_{2}^{\prime}z_{2i}}}+u_{i}\end{aligned}

The model can be expressed compactly

.. math::

   \begin{aligned}
   Y & =X_{1}\beta_{1}+X_{2}\beta_{1}+\epsilon=X\beta+\epsilon\\
   X_{2} & =Z_{1}\gamma_{1}+Z_{2}\gamma_{2}+u=Z\gamma+u\end{aligned}

The vector of instruments :math:`z_{i}` is :math:`p` by 1. There are
:math:`n` observations for all variables. :math:`k_{c}=1` if the model
contains a constant (either explicit or implicit, i.e., including all
categories of a dummy variable). The constant, if included, is in
:math:`x_{1i}`. :math:`X` is the :math:`n` by :math:`k` matrix if
regressors where row :math:`i` of :math:`X` is :math:`x_{i}^{\prime}`.
:math:`X` can be partitioned into :math:`\left[X_{1}\;X_{2}\right]`.
:math:`Z` is the :math:`n` by :math:`p` matrix of instruments. The
vector :math:`y` is :math:`n` by 1. Projection matrices for :math:`X` is
defined :math:`P_{X}=X\left(X^{\prime}X\right)^{-1}X^{\prime}`. The
projection matrix for :math:`Z` is similarly defined only using
:math:`Z`. The annihilator matrix for :math:`X` is
:math:`M_{X}=I-P_{X}`.

Parameter Estimation
--------------------

Two-stage Least Squares (2SLS)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The 2SLS estimator is

.. math::

   \begin{aligned}
   \hat{\beta}_{2SLS} & =\left(X^{\prime}P_{Z}X\right)^{-1}\left(X^{\prime}P_{Z}y\right)\end{aligned}

Limited Information Maximum Likelihood and k-class Estimators
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The LIML or other k-class estimator is

.. math::

   \begin{aligned}
   \hat{\beta}_{\kappa} & =\left(X^{\prime}\left(I-\kappa M_{Z}\right)X\right)^{-1}\left(X^{\prime}\left(I-\kappa M_{Z}\right)y\right)\end{aligned}

where :math:`\kappa` is the parameter of the class. When
:math:`\kappa=1` the 2SLS estimator is recovered. When :math:`\kappa=0`,
the OLS estimator is recovered. The LIML estimator is recovered when
:math:`\kappa` set to

.. math:: \hat{\kappa}=\min\mathrm{eig\left[\left(W^{\prime}M_{Z}W\right)^{-\frac{1}{2}}\left(W^{\prime}M_{X_{1}}W\right)\left(W^{\prime}M_{Z}W\right)^{-\frac{1}{2}}\right]}

where :math:`W=\left[y\:X_{2}\right]` and :math:`\mathrm{eig}` returns
the eigenvalues.

Generalized Method of Moments (GMM)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The GMM estimator is defined as

.. math::

   \begin{aligned}
   \hat{\beta}_{GMM} & =\left(X^{\prime}ZWZ^{\prime}X\right)^{-1}\left(X^{\prime}ZWZ^{\prime}y\right)\end{aligned}

where :math:`W` is a positive definite weighting matrix.

Continuously Updating Generalized Method of Moments (GMM-CUE)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The continuously updating GMM estimator solves the minimization problem

.. math:: \min_{\beta}n\bar{g}\left(\beta\right)^{\prime}W\left(\beta\right)^{-1}\bar{g}\left(\beta\right)

where
:math:`\bar{g}\left(\beta\right)=n^{-1}\sum_{i=1}^{n}z_{i}\hat{\epsilon}_{i}`
and :math:`\hat{\epsilon}_{i}=y_{i}-x_{i}\beta`. Unlike standard GMM,
the weight matrix, :math:`W` directly depends on the model parameters
:math:`\beta` and so it is not possible to use a closed form estimator.

Basic Statistics
----------------

Let :math:`\hat{\epsilon}=y-X\hat{\beta}`. The residual sum of squares
(RSS) is :math:`\hat{\epsilon}^{\prime}\hat{\epsilon}`, the model sum of
squares (MSS) is :math:`\hat{\beta}^{\prime}X^{\prime}X\hat{\beta}` and
the total sum of squares (TSS) is
:math:`\left(y-k_{c}\bar{y}\right)^{\prime}\left(y-k_{c}\bar{y}\right)^{\prime}`\ where
:math:`\bar{y}` is the sample average of :math:`y`. The model
:math:`R^{2}`\ is defined

.. math::

   \begin{aligned}
   R^{2} & =1-\frac{\hat{\epsilon}^{\prime}\hat{\epsilon}}{\left(y-k_{c}\bar{y}\right)^{\prime}\left(y-k_{c}\bar{y}\right)^{\prime}}=1-\frac{RSS}{TSS}\end{aligned}

and the adjusted :math:`R^{2}` is defined

.. math::

   \begin{aligned}
   \bar{R}^{2} & =1-\left(1-R^{2}\right)\frac{N-k_{c}}{N-k}.\end{aligned}

The residual variance is
:math:`s^{2}=n^{-1}\hat{\epsilon}^{\prime}\hat{\epsilon}` unless the
debiased flag is used, in which case a small sample adjusted version is
estimated
:math:`s^{2}=\left(n-k\right)^{-1}\hat{\epsilon}^{\prime}\hat{\epsilon}`.
The model degree of freedom is :math:`k` and the residual degree of
freedom is :math:`n-k`.

The model F-statistic is defined

.. math::

   \begin{aligned}
   F & =\hat{\beta}_{-}^{\prime}\hat{V}_{-}^{-1}\dot{\hat{\beta}_{-}}\end{aligned}

where the notation :math:`\hat{\beta}_{-}` indicates that the constant
is excluded from :math:`\hat{\beta}` and :math:`\hat{V}_{-}` indicates
that the row and column corresponding to a constant are excluded. [1]_
The test statistic is distributed as :math:`\chi_{k-k_{c}}^{2}.` If the
debiased flag is set, then the test statistic :math:`F` is transformed
as :math:`F/\left(k-k_{c}\right)` and a :math:`F_{k-k_{c},n-k}`
distribution is used.

Parameter Covariance Estimation
-------------------------------

Two-stage LS, LIML and k-class estimators
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Four covariance estimators are available. The first is the standard
homoskedastic covariance, defined as

.. math::

   \begin{aligned}
   \hat{\Sigma}=n^{-1}s^{2}\left(\frac{X^{\prime}\left(I-\kappa M_{z}\right)X}{n}\right)^{-1} & =n^{-1}s^{2}\hat{A}.\end{aligned}

Note that this estimator can be expressed as

.. math::

   \begin{aligned}
   \hat{\Sigma}=n^{-1}\hat{A}^{-1}\left\{ s^{2}\hat{A}\right\} \hat{A}^{-1} & =n^{-1}\hat{A}^{-1}\hat{B}\hat{A}^{-1}.\end{aligned}

All estimators take this form and only differ in how the asymptotic
covariance of the scores, :math:`B`, is estimated. For the homoskedastic
covariance estimator, :math:`\hat{B}=s^{2}\hat{A}.` The score covariance
in the heteroskedasticity robust covariance estimator is

.. math::

   \begin{aligned}
   \hat{B} & =n^{-1}\sum_{i=1}^{n}\hat{\epsilon}_{i}^{2}\hat{x}_{i}\hat{x}_{i}^{\prime}=n^{-1}\sum_{i=1}^{n}\hat{\xi}_{i}\hat{\xi}_{i}^{\prime}.\end{aligned}

where :math:`\hat{x_{i}}` is row :math:`i` of :math:`\hat{X}=P_{Z}X` and
:math:`\hat{\xi}_{i}=\hat{\epsilon}_{i}\hat{x}_{i}`.

The kernel covariance estimator is robust to both heteroskedasticity and
autocorrelation and is defined as

.. math::

   \begin{aligned}
   \hat{B} & =\hat{\Gamma}_{0}+\sum_{i=1}^{n-1}K\left(\frac{i}{h}\right)\left(\hat{\Gamma}_{i}+\hat{\Gamma}_{i}^{\prime}\right)\\
   \hat{\Gamma_{j}} & =n^{-1}\sum_{i=j+1}^{n}\hat{\xi}_{i-j}\hat{\xi}_{i}^{\prime}\end{aligned}

where :math:`K\left(\frac{i}{h}\right)` is a kernel weighting function
where :math:`h` is the kernel bandwidth.

The one-way clustered covariance estimator is defined as

.. math::

   \begin{aligned}
   n^{-1}\sum_{j=1}^{g}\left(\sum_{i\in\mathcal{G}_{j}}\hat{\xi}_{i}\right)\left(\sum_{i\in\mathcal{G}_{j}}\hat{\xi}_{i}\right)^{\prime}\end{aligned}

where :math:`\sum_{i\in\mathcal{G}_{j}}\hat{\xi}_{i}` is the sum of the
scores for all members in group :math:`\mathcal{G}_{j}` and :math:`g` is
the number of groups.

If the debiased flag is used to perform a small-sample adjustment, all
estimators except the clustered covariance are rescaled by
:math:`\left(n-k\right)/n`. The clustered covariance is rescaled by
:math:`\left(\left(n-k\right)\left(n-1\right)/n^{2}\right)\left(\left(g-1\right)/g\right)`. [2]_

Standard Errors
~~~~~~~~~~~~~~~

Standard errors are defined as

.. math:: s.e.\left(\hat{\beta}_{j}\right)=\sqrt{e_{j}^{\prime}\hat{\Sigma}e_{j}}

where :math:`e_{j}` is a vector of 0s except in location :math:`j` which
is 1.

T-statistics
~~~~~~~~~~~~

T-statistics test the null :math:`H_{0}:\beta_{j}=0` against a 2-sided
alternative and are defined as

.. math:: z=\frac{\hat{\beta}_{j}}{s.e.\left(\hat{\beta}_{j}\right)}.

P-values
~~~~~~~~

P-values are computes using 2-sided tests,

.. math:: Pr\left(\left|z\right|>Z\right)=2-2\Phi\left(\left|z\right|\right)

If the covariance estimator was debiased, a Student’s t distribution
with :math:`n-k` degrees of freedom is used,

.. math::

   \begin{aligned}
   Pr\left(\left|z\right|>Z\right) & =2-2t_{n-k}\left(\left|z\right|\right)\end{aligned}

where :math:`t_{n-k}\left(\cdot\right)` is the CDF of a Student’s T
distribution.

Confidence Intervals
~~~~~~~~~~~~~~~~~~~~

Confidence intervals are constructed as

.. math:: CI_{i,1-\alpha}=\hat{\beta}_{i}\pm q_{\alpha/2}\times\hat{\sigma}_{\beta_{i}}

where :math:`q_{\alpha/2}` is the :math:`\alpha/2` quantile of a
standard Normal distribution or a Student’s t. The Student’s t is used
when a debiased covariance estimator is used.

Linear Hypothesis Tests
~~~~~~~~~~~~~~~~~~~~~~~

Linear hypothesis tests examine the validity of nulls of the form
:math:`H_{0}:R\beta-r=0` and are implemented using a Wald test statistic

.. math:: W=\left(R\hat{\beta}-r\right)^{\prime}\left[R^{\prime}\hat{\Sigma}R\right]^{-1}\left(R\hat{\beta}-r\right)\sim\chi_{q}^{2}

where :math:`q` is the :math:`rank\left(R\right)` which is usually the
number of rows in :math:`R` . If the debiased flag is used, then
:math:`W/q` is reported and critical and p-values are taken from a
:math:`F_{q,n-k}` distribution.

GMM Covariance estimators
~~~~~~~~~~~~~~~~~~~~~~~~~

GMM covariance depends on the weighting matrix used in estimation and
the assumed covariance of the scores. In most applications these are the
same and so the inefficient form,

.. math:: \hat{\Sigma}=n^{-1}\left(\left(\frac{X'Z}{n}\right)W\left(\frac{Z'X}{n}\right)\right)^{-1}\left(\left(\frac{X'Z}{n}\right)WSW\left(\frac{Z'X}{n}\right)\right)\left(\left(\frac{X'Z}{n}\right)W\left(\frac{Z'X}{n}\right)\right)^{-1}

will collapse to the simpler form

.. math:: \hat{\Sigma}=n^{-1}\left(\left(\frac{X'Z}{n}\right)W\left(\frac{Z'X}{n}\right)\right)^{-1}

when :math:`W=S^{-1}`. When an unadjusted (homoskedastic) covariance is
used,

.. math:: \hat{S}=\tilde{s}^{2}n^{-1}\sum_{j=1}^{n}z_{j}^{\prime}z_{j}

where
:math:`\tilde{s}^{2}=n^{-1}\sum_{i=1}^{n}\left(\epsilon_{i}-\bar{\epsilon}\right)^{2}`
subtracts the mean which may be non-zero if the model is overidentified.
Like previous covariance estimators, if the debiased flag is used,
:math:`n^{-1}` is replaced by :math:`\left(n-k\right)^{-1}`. When a
robust (heteroskedastic) covariance is used, the estimator of :math:`S`
is modified to

.. math:: \hat{S}=n^{-1}\sum_{i=1}^{n}\hat{\epsilon}_{i}^{2}z_{i}^{\prime}z_{i}.

If the debiased flag is used, :math:`n^{-1}` is replaced by
:math:`\left(n-k\right)^{-1}`.

Kernel covariance estimators of :math:`S` take the form

.. math::

   \begin{aligned}
   \hat{S} & =\hat{\Gamma}_{0}+\sum_{i=1}^{n-1}k\left(i/h\right)\left(\hat{\Gamma}_{i}+\hat{\Gamma}_{i}^{\prime}\right)\\
   \hat{\Gamma_{j}} & =n^{-1}\sum_{i=j+1}^{n}\hat{\epsilon}_{i-j}\hat{\epsilon}_{i}z_{i-j}^{\prime}z_{i}\end{aligned}

and :math:`k\left(\cdot\right)` is a kernel weighting function with
bandwidth :math:`h`. If the debiased flag is used, :math:`n^{-1}` is
replaced by :math:`\left(n-k\right)^{-1}`.

The one-way clustered covariance estimator is defined as

.. math:: \hat{S}=n^{-1}\sum_{j=1}^{g}\left(\sum_{i\in\mathcal{G}_{j}}\hat{\epsilon}_{i}z_{i}\right)^{\prime}\left(\sum_{i\in\mathcal{G}_{j}}\hat{\epsilon}_{i}z_{i}\right)

where :math:`\sum_{i\in\mathcal{G}_{j}}\hat{\epsilon}_{i}z_{i}` is the
sum of the moment conditional for all members in group
:math:`\mathcal{G}_{j}` and :math:`g` is the number of groups. If the
debiased flag is used, the :math:`n^{-1}` term is replaced by

.. math:: \left(n-k\right)^{-1}\frac{n-1}{n}\frac{g}{g-1}.

GMM Weight Estimators
~~~~~~~~~~~~~~~~~~~~~

The GMM optimal weight estimators are identical to the the estimators of
:math:`S` with two notable exceptions. First, they are never debiased
and so always use :math:`n^{-1}`. Second, if the center flag is true,
the demeaned moment conditions defined as
:math:`\tilde{g}_{i}=z_{i}\hat{\epsilon}_{i}-\overline{z\epsilon}` are
used in-place of :math:`g_{i}` in the robust, kernel and clustered
estimators. The unadjusted estimator is always centered, and so this
option has no effect.

Post-estimation
---------------

2SLS and LIML Post-estimation Results
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Sargan**

Sargan’s test of over-identifying restrictions examines whether the
variance of the IV residuals is similar to that of the OLS residuals.
The test statistic is computed

.. math:: s=n(1-\hat{\epsilon}^{\prime}M_{Z}\hat{\epsilon}/\hat{\epsilon}^{\prime}\hat{\epsilon})\sim\chi_{v}^{2}

where :math:`\hat{\epsilon}` are the IV residuals and :math:`M_{Z}` is
the annihilator matrix using all exogenous variables.\ :math:`\nu` is
the number of overidentifying restrictions, which is the number of
instruments minus the number of endogenous variables, :math:`p-k_{2}`.

**Basmann**

Basmann’s test is a small-sample corrected version of Sargan’s test of
over-identifying restrictions. It has the same distribution. Let
:math:`s` be Sargan’s test statistic, then Basmann’s test statistic is

.. math:: s(n-p)/(n-s)\sim\chi_{v}^{2}

**Wooldridge score**

Wooldridge’s score test of exogeneity examines the magnitude of the
correlation between the OLS residuals and a an appropriately constructed
set of residuals of the instruments. Define :math:`e=M_{X}Y` and
:math:`\nu=M_{X}M_{Z}X_{2}`. Then the test statistic is computed from
the regression

.. math:: 1=\gamma_{1}\hat{\epsilon}_{1}\hat{v}_{1,i}+\ldots+\gamma_{p}\hat{\epsilon}_{1}\hat{v}_{p,i}+\eta_{i}

as :math:`nR^{2}\sim\chi_{k_{2}}^{2}`.

**Wooldridge regression**

Wooldridge’s regression test of exogeneity is closely related to the
score test and is generally more powerful. It also inherits robustness
to heteroskedasticity and/or autocorrelation the comes from the choice
of covariance estimator used in the model. Define :math:`R=M_{Z}X_{2}`.
The test is implemented in a regression of

.. math:: Y=X\beta+R\gamma+\eta

as

.. math:: \hat{\gamma}^{\prime}\hat{\Sigma}_{\gamma}^{-1}\gamma^{\prime}\sim\chi_{k_{2}}^{2}

where :math:`\hat{\Sigma}_{\gamma}` is the block of the covariance
matrix corresponding to the :math:`\gamma` parameters.
:math:`\hat{\Sigma}` is estimated using the same covariance estimator as
the model fit.

**Wooldridge’s Test of Overidentifying restrictions**

Wooldridge’s test is a score test examining whether the component of the
instrument that is uncorrelated with both the included exogenous and the
fitted exogenous is uncorrelated with the IV residuals. Define
:math:`\tilde{Z}=M_{\left[X_{1}\:\hat{X}_{2}\right]}Z_{2,1:q}` where
:math:`\hat{X_{2}}` are the fitted values from the first stage
regression of the endogenous on all exogenous variables and
:math:`Z_{2,1:q}` contains any :math:`q` columns of :math:`Z_{2}`,
:math:`q=p-k_{2}` . The test statistic is computed using a regression of
1s on the test functions :math:`\hat{\epsilon}_{i}\tilde{z}_{i,j}` for
:math:`j=1,\ldots,q` which should have expected value 0.

.. math:: 1=\gamma_{1}\hat{\epsilon}_{i}\tilde{z}_{i,1}+\ldots+\gamma_{q}\hat{\epsilon}_{i}\tilde{z}_{i,q}

The test statistic is :math:`nR^{2}\sim\chi_{q}^{2}` from the
regression.

**Anderson-Rubin**

The Andersen-Rubin test of overidentification examines the magnitude of
the LIML :math:`\hat{\kappa}`\ which should be close to unity when the
model is not overidentified.

.. math:: n\ln(\hat{\kappa})\sim\chi_{q}^{2}

where :math:`q=p-k_{2}`.

**Basman’s F**

Basmann’s F test of overidentification also examines the magnitude of
the LIML :math:`\hat{\kappa}`. The test statistic is

.. math:: \hat{\kappa}(n-p)/q\sim F_{q,n-n_{instr}}

where :math:`q=p-k_{2}`.

**Durbin and Wu-Haussman**

Durbin’s and the Wu-Hausman tests of exogeneity test of exogeneity is
depends on the variance of the residuals when come endogenous variables
are treated as exogenous against the case where they are treated as
endogenous. Durbin’s test statistic is

.. math::

   \begin{aligned}
   \delta= & \hat{\epsilon}'_{e}P_{[z,w]}\hat{\epsilon}_{e}-\hat{\epsilon}'_{c}P_{z}\hat{\epsilon}_{c}\\
   D= & \delta/(\hat{\epsilon}'_{e}\hat{\epsilon}_{e})/n\sim\chi_{q}^{2}\end{aligned}

and the Wu-Hausman test statistic is

.. math::

   \begin{aligned}
   WH= & \frac{\delta/q}{(\hat{\epsilon}'_{e}\hat{\epsilon}_{e}-\delta)/v}\sim F_{q,\nu}\end{aligned}

where :math:`\hat{\epsilon}_{e}` treats the selected set of endogenous
variables as exogenous (efficient estimate) and
:math:`\hat{\epsilon}_{c}` is a consistent estimator if these variables
are endogenous.\ :math:`P_{\left[Z\,W\right]}` is the projection matrix
containing all exogenous variables including the instrument as well as
the variables being tested for endogeneity
:math:`\left(W\right)`.\ :math:`q` is the number of variables being
tested for exogeneity and :math:`\nu=n-k1-k2-q`.

GMM Post-estimation Results
~~~~~~~~~~~~~~~~~~~~~~~~~~~

**J-stat**

The J-statistic tests whether the model is over-identified, and is
defined

.. math:: n\bar{g}'W^{-1}\bar{g}\sim\chi_{q}^{2}

where :math:`\bar{g}=n^{-1}\sum\hat{\epsilon}_{i}z_{i}` and :math:`W` is
a consistent estimator of the variance of :math:`\sqrt{n}\bar{g}`. The
degree of freedom is :math:`q=p-k_{2}`.

**C-stat**

The C-statistic tests exogeneity by treating a the set of endogenous
variables as exogenous. In practice this means that are included in the
GMM moment conditions, and so a likelihood-ratio-like test statistic can
be computed as

.. math:: J_{e}-J_{c}\sim\chi_{m}^{2}

where :math:`J_{e}` is the J-statistic treating the tested variables as
exogenous and :math:`J_{c}` leaves then as endogenous. The optimal
weighting matrix is computed from the expanded model (efficient) and
used to estimate parameters in both models. This ensures that the test
statistic is positive.

First-stage Estimation Analysis
-------------------------------

**Partial R2 and Partial F-statistic**

The :math:`R^{2}` is reported after orthogonalizing the instruments from
included exogenous variables, so that the model estimated is

.. math:: x_{2i}=\gamma_{0}+\tilde{z}_{2i}\gamma+\eta_{i}

where :math:`\tilde{Z}_{2}=M_{X_{1}}\tilde{Z}`. The partial
:math:`F`-statistic is the F-statistic from this regression. It is
implemented as a standard :math:`F`-statistic when the data is assumed
to be homoskedastic with an :math:`F_{p_{2},n-p_{2}}` distribution. In
all other cases, a quadratic form is used with an asymptotic
:math:`\chi_{p_{2}}^{2}` distribution testing :math:`H_{0}:\gamma=0`.

**Shea’s R2**

Shea’s :math:`R^{2}` is defined as the ratio of the parameter variances
under OLS and 2SLS estimation standardized by the unexplained variance
under each,

.. math:: \frac{\frac{\hat{\sigma}_{OLS,\beta_{i}}^{2}}{1-R_{OLS}^{2}}}{\frac{\hat{\sigma}_{IV,\beta_{i}}^{2}}{1-R_{IV}^{2}}}=\frac{\hat{\sigma}_{OLS,\beta_{i}}^{2}}{\hat{\sigma}_{IV,\beta_{i}}^{2}}\times\frac{1-R_{IV}^{2}}{1-R_{OLS}^{2}}

If the estimator under 2SLS was as good as under OLS, both ratios would
be 1 and Shea’s :math:`R^{2}=1`. On the other hand, the worse the
:math:`IV` fit in terms of either :math:`R^{2}` or the parameter
variances, the lower the value reported by Shea’s :math:`R^{2}`.

Kernel Weights and Bandwidth Selection
--------------------------------------

**Kernel weights**

In all formulas, :math:`m` is the kernel bandwidth parameter.

-  Bartlett

   .. math:: w_{i}=1-\frac{i}{m+1},\,i<m

-  Parzen

   .. math::

      \begin{aligned}
      z_{i} & =\frac{i}{m+1},\,i<m\\
      w_{i} & =1-6z_{i}^{2}+6z_{i}^{3},z\leq0.5\\
      w_{i} & =2(1-z_{i})^{3},z>0.5\end{aligned}

-  Quadratic-Spectral

   .. math::

      \begin{aligned}
      z_{i} & =\frac{6\pi i}{5m}\\
      w_{0} & =1\\
      w_{i} & =3(\sin(z_{i})/z_{i}-\cos(z_{i}))/z_{i}^{2},\:i\geq1\end{aligned}

**Optimal BW selection**

TODO

Constant Detection
------------------

Whether a model contains a constant or equivalent is tested using three
tests. These are executed in order and so once a constant is detected,
the others are not executed. The simplest method to ensure that a
constant is correctly detected is to include a columns of 1s.

#. A column with only 1.0s

#. A column with a maximum minus minimum equal to 0 and that is not all
   0s.

#. Whether the rank of :math:`X` is the same as
   :math:`\left[1_{N}\:X\right]`. If these are the same, then the model
   contains a constant equivalent (e.g., dummies for all categories).

.. [1]
   If the model contains an implicit constant, e.g., all categories of a
   dummy, one of the categories is excluded when computing the test
   statistic. The choice of category to drop has no effect and is
   equivalent to reparameterizing the model with a constant and
   excluding one category of dummy.

.. [2]
   This somewhat non-obvious choice is driven by Stata compatibility.
