Metadata-Version: 2.4
Name: pyartool
Version: 0.1.0
Summary: Aligned Rank Transform for Nonparametric Factorial ANOVAs — Python port of the R ARTool package
Author: Matthew Kay, Lisa A. Elkin, James J. Higgins, Jacob O. Wobbrock
License: GPL-2.0-or-later
Project-URL: Homepage, https://github.com/CHAI-NU/pyartool
Project-URL: Repository, https://github.com/CHAI-NU/pyartool
Project-URL: Documentation, https://github.com/CHAI-NU/pyartool#readme
Keywords: statistics,anova,nonparametric,aligned-rank-transform,factorial,hypothesis-testing
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License v2 or later (GPLv2+)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.22
Requires-Dist: pandas>=1.4
Requires-Dist: scipy>=1.8
Requires-Dist: statsmodels>=0.13
Dynamic: license-file

# PyARTool: Aligned Rank Transform for Nonparametric Factorial ANOVAs

**Python port of the R [ARTool](https://github.com/mjskay/ARTool) package.**

PyARTool implements the Aligned Rank Transform (ART) for conducting nonparametric analyses of variance on factorial models. It faithfully translates the R ARTool package by Wobbrock, Findlater, Gergle, Higgins, Kay, and Elkin to Python, producing **numerically identical results**.

---

## Table of Contents

- [Overview](#overview)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [API Reference](#api-reference)
  - [`art()` — Aligned Rank Transform](#art--aligned-rank-transform)
  - [`anova_art()` — ANOVA on ART Data](#anova_art--anova-on-art-data)
  - [`summary_art()` — Diagnostic Summary](#summary_art--diagnostic-summary)
  - [`art_con()` — Contrast Tests (ART-C)](#art_con--contrast-tests-art-c)
  - [`artlm()` / `artlm_con()` — Access Fitted Models](#artlm--artlm_con--access-fitted-models)
  - [Dataset Loaders](#dataset-loaders)
- [Supported Designs](#supported-designs)
- [Formula Syntax](#formula-syntax)
- [Detailed Walkthrough](#detailed-walkthrough)
  - [Example 1: Between-Subjects Factorial](#example-1-between-subjects-factorial)
  - [Example 2: Split-Plot / Mixed-Effects](#example-2-split-plot--mixed-effects)
  - [Example 3: Multi-Factor Within-Subjects](#example-3-multi-factor-within-subjects)
  - [Example 4: Repeated Measures with Error()](#example-4-repeated-measures-with-error)
- [P-Value Adjustment Methods](#p-value-adjustment-methods)
- [R Parity & Validation](#r-parity--validation)
- [Architecture & Implementation Notes](#architecture--implementation-notes)
- [Dependencies](#dependencies)
- [Example Scripts](#example-scripts)
- [Citations](#citations)
- [License](#license)

---

## Overview

The **Aligned Rank Transform (ART)** is a nonparametric technique that allows you to use standard ANOVA procedures on ranked data, while correctly handling main effects, interactions, and contrasts in factorial designs. It works by:

1. **Aligning** the response variable to strip out effects not of interest for each term.
2. **Ranking** the aligned responses.
3. Running standard **ANOVAs** on the aligned-and-ranked data.

PyARTool automates this entire pipeline and additionally supports the **ART-C** procedure for post-hoc contrast tests (Elkin et al., 2021).

### When to Use ART

Use the Aligned Rank Transform when:

- Your data violates ANOVA assumptions (non-normality, heteroscedasticity).
- You have a factorial design (two or more factors) — ART handles interactions correctly, unlike simpler rank-based tests.
- You need post-hoc pairwise or interaction contrasts on nonparametric data.

For more background, see the [ARTool project page](https://depts.washington.edu/acelab/proj/art/).

---

## Installation

### From PyPI (recommended)

```bash
pip install pyartool
```

### From source

```bash
git clone <this-repo>
cd PyARTool

# Create and activate a virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate   # macOS/Linux
# .venv\Scripts\activate    # Windows

# Install in editable mode
pip install -e .
```

### Requirements

- Python >= 3.9
- numpy >= 1.22
- pandas >= 1.4
- scipy >= 1.8
- statsmodels >= 0.13

---

## Quick Start

```python
from pyartool import art, anova_art, art_con, load_higgins1990_table5

# Load data
df = load_higgins1990_table5()

# Step 1: Apply the Aligned Rank Transform
m = art("DryMatter ~ Moisture * Fertilizer + (1|Tray)", data=df)

# Step 2: Run the nonparametric ANOVA
print(anova_art(m))
#              Term  Df  Df.res        F        Pr(>F)
# 0        Moisture   3     8.0   23.833  2.419913e-04
# 1      Fertilizer   3    24.0  122.402  1.110223e-14
# 2  Moisture:Fert.   9    24.0    5.118  6.466476e-04

# Step 3: Post-hoc contrasts
print(art_con(m, "Moisture"))
#   contrast  estimate    SE  df  t.ratio   p.value
# 0  m1 - m2   -23.083  4.12   8   -5.607    0.0023
# ...
```

---

## API Reference

### `art()` — Aligned Rank Transform

```python
from pyartool import art

result = art(formula, data)
```

| Parameter | Type | Description |
|-----------|------|-------------|
| `formula` | `str` | R-style formula (see [Formula Syntax](#formula-syntax)). |
| `data` | `pd.DataFrame` | Data in long format. Factor columns should be `pd.Categorical` or string. |

**Returns** an `ArtResult` object containing:

| Attribute | Description |
|-----------|-------------|
| `result.formula` | Original formula string. |
| `result.data` | Original DataFrame. |
| `result.aligned` | DataFrame of aligned responses (one column per term). |
| `result.aligned_ranks` | DataFrame of ranks of aligned responses. |
| `result.residuals` | Residuals from the cell-means model. |
| `result.cell_means` | Cell means for every term. |
| `result.estimated_effects` | Estimated effects for every term. |

### `anova_art()` — ANOVA on ART Data

```python
from pyartool import anova_art

anova_table = anova_art(m)
```

| Parameter | Type | Description |
|-----------|------|-------------|
| `m` | `ArtResult` | Object returned by `art()`. |

**Returns** a `pd.DataFrame` with columns: `Term`, `Df`, `Df.res`, `F`, `Pr(>F)`.

The model type is determined automatically by the formula:

| Formula pattern | Model | R equivalent |
|-----------------|-------|--------------|
| `Y ~ A * B` | OLS (fixed effects) | `lm()` |
| `Y ~ A * B + (1\|S)` | Mixed-effects (REML) | `lmer()` |
| `Y ~ A * B + Error(S)` | Repeated measures | `aov(Error())` |

### `summary_art()` — Diagnostic Summary

```python
from pyartool import summary_art

s = summary_art(m)
```

**Returns** an `ArtSummary` object with:

| Attribute | Description |
|-----------|-------------|
| `s.aligned_col_sums` | Dict of column sums of aligned responses (should all be ~0). |
| `s.aligned_anova_f_values` | Array of F values from ANOVAs on non-target aligned responses (should all be ~0). |

These diagnostics verify that the ART alignment procedure correctly stripped out effects not of interest. If values are not close to zero, the ART may not be appropriate for your data.

### `art_con()` — Contrast Tests (ART-C)

```python
from pyartool import art_con

contrasts = art_con(m, formula, *, response="art", method="pairwise",
                    interaction=False, adjust="tukey")
```

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `m` | `ArtResult` | — | Object returned by `art()`. |
| `formula` | `str` | — | Term to contrast: `"A"`, `"A:B"`, or `"A:B:C"`. |
| `response` | `str` | `"art"` | `"art"` (ranked) or `"aligned"` (unranked). |
| `method` | `str` | `"pairwise"` | Contrast method. |
| `interaction` | `bool` | `False` | If `True`, compute difference-of-difference contrasts. |
| `adjust` | `str\|None` | `"tukey"` | P-value adjustment (see [below](#p-value-adjustment-methods)). |

**Returns** a `pd.DataFrame` with columns: `contrast`, `estimate`, `SE`, `df`, `t.ratio`, `p.value`.

### `artlm()` / `artlm_con()` — Access Fitted Models

For advanced users who need the underlying `statsmodels` fit objects:

```python
from pyartool import artlm, artlm_con

# Get the fitted model for a specific ART term
lm_result = artlm(m, "A:B")

# Get the fitted model for an ART-C contrast term
lm_con_result = artlm_con(m, "A:B")
```

### Dataset Loaders

PyARTool bundles the same example datasets as the R package:

```python
from pyartool import (
    load_higgins1990_table1,   # 3x3 between-subjects
    load_higgins1990_table5,   # 4x4 split-plot (Moisture x Fertilizer)
    load_elkin_ab,             # 2x2 within-subjects
    load_elkin_abc,            # 2x2x2 within-subjects
    load_higgins_abc,          # 2x2x2 mixed design
)

df = load_higgins1990_table5()
print(df.head())
#   Tray Moisture Fertilizer  DryMatter
# 0   t1       m1         f1        3.3
# 1   t1       m1         f2        4.3
# 2   t1       m1         f3        4.5
# 3   t1       m1         f4        5.8
# 4   t2       m1         f1        4.0
```

---

## Supported Designs

| Design | Formula Example | Model |
|--------|----------------|-------|
| Between-subjects factorial | `Y ~ A * B` | OLS (`lm`) |
| Split-plot / mixed-effects | `Y ~ A * B + (1\|Subject)` | Mixed (`lmer` via `MixedLM`) |
| Repeated measures (aov) | `Y ~ A * B + Error(Subject)` | RM-ANOVA (`aov`) |
| 2-factor | `Y ~ A * B` | Any of the above |
| 3-factor | `Y ~ A * B * C + (1\|S)` | Any of the above |
| N-factor | `Y ~ A * B * C * D + ...` | Any of the above |

---

## Formula Syntax

PyARTool uses R-style formula strings. The parser supports all the same patterns as the R ARTool package:

### Fixed effects

```python
# Full factorial (A + B + A:B)
"Y ~ A * B"

# Three-way factorial (all main effects, 2-way, and 3-way interactions)
"Y ~ A * B * C"

# You can also spell out terms explicitly
"Y ~ A + B + A:B"
```

### Random / grouping effects (mixed-effects model)

```python
# Random intercept for Subject — fits a mixed-effects model (lmer)
"Y ~ A * B + (1|Subject)"
```

### Error terms (repeated measures ANOVA)

```python
# Repeated measures — fits an aov() with Error()
"Y ~ A * B + Error(Subject)"
```

### Important notes

- The response variable (left of `~`) must be a single numeric column.
- All factor columns should be `pd.Categorical` or string type. Numeric columns used as factors will raise a warning.
- The formula must specify the **full factorial** — all lower-order terms must be present for any interaction term. PyARTool will raise an error if the design is not fully crossed.

---

## Detailed Walkthrough

### Example 1: Between-Subjects Factorial

A simple 3x3 factorial design with no repeated measures:

```python
from pyartool import art, anova_art, load_higgins1990_table1

df = load_higgins1990_table1()
print(df.head())
#   Subject Row Column  Response
# 0      s1   1      1         9
# 1      s2   1      1         6
# ...

# Fit the ART model (no grouping term = OLS)
m = art("Response ~ Row * Column", data=df)

# Run ANOVA
print(anova_art(m))
#         Term  Df  Df.res       F        Pr(>F)
# 0        Row   2    27.0  29.993  1.383278e-07
# 1     Column   2    27.0  77.867  6.149827e-12
# 2  Row:Column  4    27.0   0.642  6.374203e-01
```

### Example 2: Split-Plot / Mixed-Effects

When you have a grouping factor (e.g., trays, subjects), include `(1|Group)` to fit a mixed-effects model:

```python
from pyartool import art, anova_art, summary_art, art_con
from pyartool import load_higgins1990_table5

df = load_higgins1990_table5()

# Moisture varies between trays; Fertilizer varies within trays
m = art("DryMatter ~ Moisture * Fertilizer + (1|Tray)", data=df)

# Check diagnostics
s = summary_art(m)
print("Aligned column sums:", s.aligned_col_sums)
# Should all be ~0

# ANOVA
print(anova_art(m))
#                  Term  Df  Df.res        F        Pr(>F)
# 0            Moisture   3     8.0   23.833  2.419913e-04
# 1          Fertilizer   3    24.0  122.402  1.110223e-14
# 2  Moisture:Fertilizer  9    24.0    5.118  6.466476e-04

# Post-hoc: pairwise contrasts on Moisture
# Default adjustment is Tukey HSD
print(art_con(m, "Moisture"))
#   contrast  estimate    SE  df  t.ratio   p.value
# 0  m1 - m2   -23.083  4.12   8   -5.607    0.0023
# 1  m1 - m3   -33.750  4.12   8   -8.198    0.0002
# ...

# Interaction contrasts with Holm adjustment
print(art_con(m, "Moisture:Fertilizer", adjust="holm"))
```

### Example 3: Multi-Factor Within-Subjects

A 2x2x2 fully within-subjects design:

```python
from pyartool import art, anova_art, art_con, load_elkin_abc

df = load_elkin_abc()
m = art("Y ~ A * B * C + (1|S)", data=df)

# Full ANOVA table
print(anova_art(m))
#    Term  Df  Df.res        F        Pr(>F)
# 0     A   1    49.0  288.181  0.000000e+00
# 1     B   1    49.0   28.103  2.732842e-06
# 2     C   1    49.0   60.510  4.168039e-10
# 3   A:B   1    49.0   28.528  2.377711e-06
# 4   A:C   1    49.0   16.545  1.720573e-04
# 5   B:C   1    49.0   76.258  1.481193e-11
# 6 A:B:C   1    49.0   75.592  1.690836e-11

# Contrasts on the 3-way interaction
print(art_con(m, "A:B:C", adjust="holm"))

# Contrasts on a 2-way interaction (averaged over 3rd factor)
print(art_con(m, "A:B", adjust="holm"))

# Single-factor contrasts
print(art_con(m, "A"))  # Tukey by default

# Different adjustment methods
print(art_con(m, "B:C", adjust="bonferroni"))
```

### Example 4: Repeated Measures with Error()

If you prefer traditional repeated-measures ANOVA (via `aov`) instead of mixed-effects models, use `Error()`:

```python
from pyartool import art, anova_art, load_higgins_abc

df = load_higgins_abc()
m = art("Y ~ A * B * C + Error(Subject)", data=df)

print(anova_art(m))
#    Term  Df  Df.res        F        Pr(>F)
# 0     A   1     4.0  120.471  3.914986e-04
# 1     B   1     4.0  120.471  3.914986e-04
# 2     C   1     4.0   14.322  1.936216e-02
# 3   A:B   1     4.0   81.920  8.257143e-04
# 4   A:C   1     4.0    0.126  7.406643e-01
# 5   B:C   1     4.0    0.232  6.552898e-01
# 6 A:B:C   1     4.0    0.972  3.800992e-01
```

---

## P-Value Adjustment Methods

The `adjust` parameter in `art_con()` supports these methods:

| Value | Method | Description |
|-------|--------|-------------|
| `"tukey"` | Tukey HSD | Default. Uses the studentized range distribution. Best for pairwise comparisons. |
| `"holm"` | Holm-Bonferroni | Step-down procedure. Good general-purpose choice. |
| `"bonferroni"` | Bonferroni | Conservative; multiplies p-values by number of tests. |
| `"fdr"` or `"bh"` | Benjamini-Hochberg | Controls false discovery rate. Less conservative. |
| `"none"` or `None` | No adjustment | Raw (unadjusted) p-values. |

**Note:** The default is `"tukey"`, matching R's `emmeans` / `art.con()` behavior.

---

## R Parity & Validation

PyARTool has been validated to produce **numerically identical results** to R's ARTool package across all bundled datasets and model types:

| Dataset | Design | ANOVA | Contrasts |
|---------|--------|-------|-----------|
| Higgins1990Table1 | 3x3 OLS | Exact match | — |
| Higgins1990Table5 | 4x4 split-plot (lmer) | Exact match | Moisture (Tukey), Fertilizer (Tukey), Moisture:Fertilizer (Holm, 120 pairs) |
| ElkinABC | 2x2x2 within (lmer) | Exact match | A:B:C (Holm), A:B (Holm), A (Tukey), B:C (Bonferroni) |
| ElkinAB | 2x2 within (lmer) | Exact match | A (Tukey), B (Tukey), A:B (Holm) |
| HigginsABC | 2x2x2 mixed (aov) | Exact match | — |

The companion files [`artool_example.r`](artool_example.r) and [`example.py`](example.py) run the same analyses in R and Python respectively, allowing side-by-side output comparison.

To run both:

```bash
# R version (requires R and the ARTool package)
Rscript artool_example.r

# Python version
python example.py
```

---

## Architecture & Implementation Notes

### Package Structure

```
PyARTool/
  src/pyartool/
    __init__.py          # Public API exports
    art.py               # Core ART: alignment + ranking (art())
    formula.py           # R-style formula parser
    effects.py           # Cell means & estimated effects
    anova.py             # ANOVA: OLS, split-plot, and RM dispatching
    models.py            # artlm: model fitting (OLS / MixedLM / aov)
    summary.py           # Diagnostic checks (summary_art())
    contrasts.py         # ART-C contrasts (art_con(), artlm_con())
    datasets.py          # Bundled dataset loaders
    data/                # CSV files for bundled datasets
  tests/                 # Test suite (84 tests)
  example.py             # Python example script
  artool_example.r       # R reference script
  pyproject.toml         # Package metadata & dependencies
  README.md              # This file
```

### Key Design Decisions

**1. R-style formula parsing.** PyARTool parses R formula syntax (`Y ~ A * B + (1|S)`) with a custom parser rather than relying on `patsy` for formula interpretation. This ensures identical handling of interactions, `Error()` terms, and grouping terms.

**2. Patsy name-conflict handling.** Factor column names that conflict with `patsy` reserved words (e.g., a column literally named `C` or `S`) are automatically prefixed with `_f_` internally before model fitting and unaliased in output. This is transparent to the user.

**3. Sum (deviation) coding.** For Type III ANOVA equivalence with R, all categorical variables are explicitly coded with `statsmodels` Sum coding (`C(var, Sum)`) rather than the default Treatment coding.

**4. Split-plot ANOVA for mixed models.** The ANOVA for mixed-effects models implements a full split-plot SS decomposition to correctly compute between-group and within-group error strata, matching R's `lmer` + Kenward-Roger behavior.

**5. Satterthwaite degrees of freedom.** For mixed-model contrasts, per-contrast degrees of freedom are computed using the Satterthwaite approximation with analytical gradients and the REML Fisher information matrix, matching R's `lmerTest` / `emmeans`.

**6. Tukey HSD via studentized range.** The default p-value adjustment for pairwise contrasts uses `scipy.stats.studentized_range`, matching R's `emmeans` Tukey method.

### Dependency Mapping (R to Python)

| R Package | Python Equivalent |
|-----------|-------------------|
| `base R` (`lm`, `aov`) | `statsmodels` (OLS, formula API) |
| `lme4` (`lmer`) | `statsmodels.regression.mixed_linear_model` |
| `car` (Type III Anova) | `statsmodels.stats.anova` + custom split-plot |
| `emmeans` (contrasts) | Custom implementation in `contrasts.py` |
| `stats::p.adjust` | `statsmodels.stats.multitest.multipletests` |
| `stats::qtukey` | `scipy.stats.studentized_range` |

---

## Dependencies

PyARTool requires:

```
numpy >= 1.22
pandas >= 1.4
scipy >= 1.8
statsmodels >= 0.13
```

All dependencies are automatically installed when installing PyARTool via pip.

---

## Example Scripts

Two companion scripts are included for cross-validation:

### `example.py` — Python

Runs all five example analyses using PyARTool. This is the best place to start understanding how to use the package.

```bash
python example.py
```

### `artool_example.r` — R

Runs the same five analyses using R's ARTool package. Use this to compare outputs side-by-side.

```bash
Rscript artool_example.r
```

Both scripts cover:

1. **Between-subjects 3x3** — `Higgins1990Table1` (OLS, no repeated measures)
2. **Split-plot 4x4** — `Higgins1990Table5` (mixed-effects with `(1|Tray)`)
3. **2x2x2 within-subjects** — `ElkinABC` (mixed-effects with `(1|S)`)
4. **2x2 within-subjects** — `ElkinAB` (mixed-effects with `(1|S)`)
5. **2x2x2 mixed with Error()** — `HigginsABC` (repeated measures ANOVA)

---

## Citations

If you use PyARTool in your research, please cite the original R package and methods papers:

**Package:**

> Kay, M., Elkin, L. A., Higgins, J. J., and Wobbrock, J. O. (2025).
> *ARTool: Aligned Rank Transform for Nonparametric Factorial ANOVAs*. R package version 0.11.2.
> [https://github.com/mjskay/ARTool](https://github.com/mjskay/ARTool).
> DOI: [10.5281/zenodo.594511](https://dx.doi.org/10.5281/zenodo.594511).

**ART procedure** (used by `art()` and `anova_art()`):

> Wobbrock, J. O., Findlater, L., Gergle, D., and Higgins, J. J. (2011).
> The Aligned Rank Transform for Nonparametric Factorial Analyses Using Only ANOVA Procedures.
> *Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 2011)*.
> Vancouver, British Columbia (May 7--12, 2011). New York: ACM Press, pp. 143--146.
> DOI: [10.1145/1978942.1978963](https://dx.doi.org/10.1145/1978942.1978963).

**ART-C procedure** (used by `art_con()` and `artlm_con()`):

> Elkin, L. A., Kay, M., Higgins, J. J., and Wobbrock, J. O. (2021).
> An Aligned Rank Transform Procedure for Multifactor Contrast Tests.
> *Proceedings of the ACM Symposium on User Interface Software and Technology (UIST 2021)*.
> Virtual Event (October 10--14, 2021). New York: ACM Press, pp. 754--768.
> DOI: [10.1145/3472749.3474784](https://dx.doi.org/10.1145/3472749.3474784).

---

## License

GPL-2.0-or-later, matching the original R ARTool package.
