   143 Advanced users who need direct access to the low-level transformation
   144 step can consult the docstring and source of
   145 :func:`lwdid.transformations.apply_rolling_transform`, which implements
   146 all four modes used by :func:`lwdid.lwdid`.
   133    results = lwdid(
data,
       y='outcome',
       d='treated',
       ivar='unit',
       tvar='year',
       post='post_treatment',
       rolling='detrend',  # choose one of 'demean', 'detrend', 'demeanq', 'detrendq'
   )
   142 
   131    from lwdid import lwdid
   132 
   129 .. code-block:: python
   130 
   125 In typical applications you do not call the transformation routines
   126 directly. Instead, you specify the desired transformation mode via the
   127 ``rolling`` argument to :func:`lwdid.lwdid`:
   128 
   21 - ``'demean'``: Remove unit-specific pre-treatment means (unit fixed effects)
   22 - ``'detrend'``: Remove unit-specific linear trends
   23 - ``'demeanq'``: Demean with quarter-of-year fixed effects
   24 - ``'detrendq'``: Detrend with linear trends and quarter-of-year fixed effects
Transformations Module (transformations)
=========================================

The transformations module implements the four panel-to-cross-section
transformations described in Lee and Wooldridge (2025).

.. automodule:: lwdid.transformations
   :members:
   :undoc-members:
   :show-inheritance:

Overview
--------

This module provides four transformation **modes** that convert panel data
into cross-sectional form by removing within-unit time-series patterns using
only pre-treatment information. These modes correspond to the
``rolling`` argument in :func:`lwdid.lwdid` and are implemented internally
by :func:`lwdid.transformations.apply_rolling_transform`:

- ``'demean'``: Unit-specific demeaning
- ``'detrend'``: Unit-specific linear detrending
- ``'demeanq'``: Quarterly demeaning with seasonal effects
- ``'detrendq'``: Quarterly detrending with seasonal effects

All four methods are implemented through the high-level
``apply_rolling_transform`` function together with internal helper
functions. They remove unit-specific pre-treatment patterns using only
pre-treatment information and produce transformed outcomes (``ydot`` and
``ydot_postavg``) that are then used in the cross-sectional regressions
for ATT estimation.

Transformation Methods
----------------------

demean: Unit Fixed Effects
~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Purpose:** Remove time-invariant unit characteristics (unit fixed effects).

**Mathematical form (Lee and Wooldridge 2025, Procedure 2.1):**

For each unit i with T₀ pre-treatment periods:

1. Compute pre-treatment mean: ȳᵢ₀ = (1/T₀) Σₜ₌₁^T₀ yᵢₜ
2. Transform all observations: ỹᵢₜ = yᵢₜ - ȳᵢ₀

**Requirements:**

- T₀ ≥ 1 (at least one pre-treatment period)

**Use case:** Standard DiD with unit fixed effects.

detrend: Unit-Specific Linear Trends
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Purpose:** Remove both unit fixed effects and unit-specific linear trends.

**Mathematical form (Lee and Wooldridge 2025, Procedure 3.1):**

For each unit i with T₀ pre-treatment periods:

1. Estimate linear trend from pre-treatment data:
   yᵢₜ = αᵢ + βᵢ·t + εᵢₜ for t = 1, ..., T₀
2. Compute predicted values: ŷᵢₜ = α̂ᵢ + β̂ᵢ·t
3. Detrend all observations: ỹᵢₜ = yᵢₜ - ŷᵢₜ

**Requirements:**

- T₀ ≥ 2 (need at least 2 points to estimate a linear trend)

**Use case:** DiD with differential pre-treatment trends across units.

demeanq: Quarterly Data with Seasonality
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Purpose:** Remove unit fixed effects and quarter-of-year effects.

**Extension of demean:** Uses quarter-of-year dummy variables in the

pre-treatment transformation step to remove seasonal patterns before the
cross-sectional ATT regression. The seasonal adjustment is done at the
unit level using only pre-treatment data; the subsequent ATT regression
is run on the transformed outcome.

**Requirements:**

- T₀ ≥ 1
- Time variable must be composite: ``tvar=['year', 'quarter']``
- Each unit must have enough pre-treatment observations relative to the
  number of distinct pre-treatment quarters to avoid rank-deficient
  seasonal regressions (at least ``q + 1`` pre-treatment observations
  when ``q`` distinct quarters are observed pre-treatment)
- For each unit, every quarter appearing in the post-treatment period
  must also appear in the pre-treatment period (quarter-coverage
  condition)

**Use case:** Quarterly data with seasonal patterns.

detrendq: Quarterly Data with Trends and Seasonality
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Purpose:** Remove unit-specific trends and quarter-of-year effects.

**Extension of detrend:** Combines detrending with quarterly fixed effects

in the pre-treatment transformation step. For each unit, a linear trend
and quarter dummies are estimated using pre-treatment data only, and the
resulting fitted values are subtracted from all periods to obtain the
transformed outcome used in ATT estimation.

**Requirements:**

- T₀ ≥ 2
- Time variable must be composite: ``tvar=['year', 'quarter']``
- Each unit must have enough pre-treatment observations relative to the
  number of distinct pre-treatment quarters to avoid rank-deficient
  trend-plus-seasonality regressions (at least ``1 + q`` pre-treatment
  observations when ``q`` distinct quarters are observed pre-treatment)
- For each unit, every quarter appearing in the post-treatment period
  must also appear in the pre-treatment period (quarter-coverage
  condition)

**Use case:** Quarterly data with both trends and seasonal patterns.

Implementation Details
----------------------

Common Features
~~~~~~~~~~~~~~~

All transformation functions:

1. **Use only pre-treatment data** for computing transformations (means, trends)
2. **Apply transformations to all periods** (both pre- and post-treatment)
3. **Preserve treatment variation** for estimation in the regression step

4. **Assume cleaned required variables**: rows with missing outcome,
  treatment indicator, time variables, unit identifiers, or post
  indicators are removed during the validation step before
  transformations are applied

Data Flow
~~~~~~~~~

The transformation process:

1. **Input:** Panel data in long format with outcome variable ``y``
2. **Identify pre-treatment periods** using ``post`` indicator
3. **Compute transformation parameters** from pre-treatment data only

4. **Apply transformation** to all observations
5. **Output:** Transformed data ready for cross-sectional regression

Example Usage
-------------

In typical applications, users do not call the transformation functions
directly. Instead, they choose the desired transformation through the
``rolling`` argument of :func:`lwdid.lwdid`, for example:

.. code-block:: python

   from lwdid import lwdid

   results = lwdid(
       data=data,
       y='outcome',
       d='treated',
       ivar='unit',
       tvar='year',
       post='post',
       rolling='detrend',   # or 'demean', 'demeanq', 'detrendq'
   )

For advanced use cases, one can apply the transformation step explicitly
by calling :func:`lwdid.transformations.apply_rolling_transform` on data
that have already been validated and prepared by
``validation.validate_and_prepare_data``. This returns a DataFrame with
the residualized outcome ``ydot``, the post-period average
``ydot_postavg``, and the ``firstpost`` indicator marking the
cross-sectional regression sample.

Technical Notes
---------------

Degrees of Freedom
~~~~~~~~~~~~~~~~~~

The transformation step does not consume degrees of freedom for inference
because:

1. Transformations use only pre-treatment data
2. Treatment variation is preserved in the post-treatment period
3. Inference is based on the cross-sectional regression, not the transformation

This is a key insight of the Lee and Wooldridge method.

Handling Unbalanced Panels
~~~~~~~~~~~~~~~~~~~~~~~~~~~

The transformations handle unbalanced panels correctly:

- Each unit's transformation is computed independently
- Units with different numbers of pre-treatment periods are allowed
- Each unit must satisfy the minimum T₀ requirement for the chosen method

Missing Data
~~~~~~~~~~~~

- Observations with missing outcome values (and other required variables
  such as treatment indicator, time variables, unit identifier, or post
  indicator) are dropped during validation before the transformation
  step, with an informative warning
- Additional missingness in variables used only at later stages (for
  example, control variables or clustering variables) is handled in the
  estimation module, which may drop observations or raise errors
  depending on the context
- The transformation preserves the panel structure for the remaining
  non-missing observations

See Also
--------

- :func:`lwdid.lwdid` - Main estimation function that calls these transformations
- :doc:`../methodological_notes` - Theoretical background
- :doc:`../user_guide` - Comprehensive usage guide
