Metadata-Version: 2.4
Name: pymatchit-causal
Version: 0.5.0
Summary: Propensity Score Matching (PSM), Full Matching, Genetic Matching, Cardinality Matching, CEM, and Causal Inference in Python. A port of R's MatchIt.
Author-email: Jonas Tünnermann <jonas.tuennermann@freenet.de>
License: MIT License
        
        Copyright (c) 2025 Jonas Tünnermann
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
Project-URL: Homepage, https://github.com/jtuenner/pymatchit
Project-URL: Bug Tracker, https://github.com/jtuenner/pymatchit/issues
Keywords: causal-inference,propensity-score-matching,psm,matching,cem,coarsened-exact-matching,mahalanobis,observational-studies,matchit,full-matching,genetic-matching,cardinality-matching,cbps
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.21
Requires-Dist: pandas>=1.3
Requires-Dist: scipy~=1.7
Requires-Dist: statsmodels~=0.13
Requires-Dist: matplotlib~=3.5
Requires-Dist: scikit-learn~=1.0
Requires-Dist: seaborn~=0.11
Requires-Dist: patsy~=0.5
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: jupyter; extra == "dev"
Dynamic: license-file

# pymatchit-causal: Propensity Score Matching in Python

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.17839522.svg)](https://doi.org/10.5281/zenodo.17839522)
[![PyPI version](https://badge.fury.io/py/pymatchit-causal.svg)](https://badge.fury.io/py/pymatchit-causal)

**Scalable Causal Inference, Propensity Score Matching (PSM), and Coarsened Exact Matching (CEM).**

`pymatchit-causal` is a Python port of the standard R package `MatchIt`. It allows data scientists to preprocess data for causal inference by balancing covariates between treated and control groups using state-of-the-art matching methods. With the 0.5.0 release, it emphasizes a cohesive, publication-ready visual diagnostic suite.

## Why use pymatchit?
If you are looking for **Propensity Score Matching** in Python, this library provides a robust, "R-style" workflow including:
* **Propensity Score Estimation:** Logistic Regression (GLM), Random Forest, GBM, Neural Networks.
* **Matching Algorithms:** Nearest Neighbor (Greedy), Optimal Matching, Exact, Subclassification, and Coarsened Exact Matching (CEM).
* **Diagnostics:** Publication-ready visual alignment—featuring Love Plots (Covariate Balance), Propensity Density Plots, ECDF plots, and the newly added **Jitter Plots** for intuitive match verification.

## Features
* **Matching Methods:** Nearest Neighbor, Optimal Matching, Exact, Coarsened Exact Matching (CEM), Subclassification.
* **Distance Metrics:** Logistic Regression (GLM), Mahalanobis, Random Forest, GBM, Neural Networks, etc.
* **Diagnostics:** Cohesive diagnostic plots including visually aligned Love Plots, Jitter Plots, and Summary Tables (SMD, Variance Ratios).
* **Parity:** Designed to mirror the R `MatchIt` API (`matchit(formula, data, method=...)`).

---

## Installation

```bash
pip install pymatchit-causal
```

Dependencies: `numpy`, `pandas`, `scipy`, `statsmodels`, `matplotlib`, `scikit-learn`, `seaborn`, `patsy`.

---

## Example Workflow

**Scenario**: You have a dataset `healthcare_data.xlsx` with a binary treatment variable `took_drug`, an outcome `recovery_time`, and confounders like `age`, `severity`, and `income`.

### 1. Load Data
```python
import pandas as pd
from pymatchit import MatchIt

# Load your dataset
df = pd.read_excel("healthcare_data.xlsx")
```

### 2. Initialize and Match
We will use **Nearest Neighbor** matching using a **Random Forest** to estimate the propensity score, applying a **caliper** to ensure good matches.

```python
# Initialize the matching model
m = MatchIt(
    data=df,
    method='nearest',           # 1:1 Nearest Neighbor matching
    distance='randomforest',    # Use Random Forest for Propensity Scores
    distance_options={'n_estimators': 500},
    caliper={'distance': 0.1, 'age': 2},
    replace=False,
    random_state=42
)

# Fit the model using an R-style formula
m.fit("took_drug ~ age + severity + income + gender")
```

### 3. Assess Balance (Diagnostics)
Verify that the treatment and control groups are balanced with cohesive visualization tools.

```python
# 1. Statistical Summary
summary = m.summary()

# 2. Visual Inspection: Love Plot
m.plot(type='balance', threshold=0.1)
```
![Love Plot](assets/love_plot.png)

```python
# 3. Visual Inspection: Propensity Jitter Plot (New in 0.5.0!)
m.plot(type='jitter')
```
![Jitter Plot](assets/jitter_plot.png)

```python
# 4. Visual Inspection: Propensity Density Overlap
m.plot(type='propensity')
```
![Propensity Density Plot](assets/propensity_plot.png)

```python
# 5. Visual Inspection: ECDF Plot
m.plot(type='ecdf', variable='age')
```

### 4. Extract Matched Data
If balance is satisfactory, extract the data for analysis.

```python
# Get the final dataset containing only matched units
matched_df = m.matches(format='long') 

# Get data with weights and subclass
final_analysis_set = m.matched_data
```

### 5. Downstream Inference
Calculate cluster-robust standard errors in your final effect estimation:

```python
import statsmodels.formula.api as smf

model = smf.wls("recovery_time ~ took_drug", data=final_analysis_set, weights=final_analysis_set['weights'])
results = model.fit(cov_type='cluster', cov_kwds={'groups': final_analysis_set['subclass']})
print(results.summary())
```

---

## Citation

If you use `pymatchit-causal` in your research, please cite it:

> Tünnermann, J. (2026). pymatchit: Propensity Score Matching and Causal Inference in Python (Version 0.5.0). Zenodo. https://doi.org/10.5281/zenodo.17839522

**BibTeX:**
```bibtex
@software{pymatchit_causal,
  author       = {Jonas Tünnermann},
  title        = {pymatchit: Propensity Score Matching and Causal Inference in Python},
  year         = 2026,
  publisher    = {Zenodo},
  version      = {0.5.0},
  doi          = {10.5281/zenodo.17839522},
  url          = {https://doi.org/10.5281/zenodo.17839522}
}
```
