Metadata-Version: 2.4
Name: mawiisurv
Version: 0.4.0
Summary: Semiparametric Causal Inference for Right-Censored Outcomes with Many Weak Invalid Instruments
Author-email: Qiushi Bu <buqiushi17@mails.ucas.ac.cn>
License: MIT
Keywords: Censored outcomes,Deep neural networks,instrumental variable,generalized empirical likelihood,Mendelian randomization,Over-identification test,Semiparametric theory,Weak and invalid instruments
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Requires-Dist: numpy>=1.19
Requires-Dist: torch>=1.8
Requires-Dist: scipy>=1.5
Requires-Dist: scikit-learn>=0.24
Requires-Dist: xgboost>=1.3
Requires-Dist: numba>=0.53
Dynamic: license-file

# MAWII-SURV

> Semiparametric causal inference for right-censored outcomes with many weak or invalid instruments, powered by the GEL-NOW framework.

[![PyPI](https://img.shields.io/pypi/v/mawiisurv.svg)](https://pypi.org/project/mawiisurv/)
[![Python](https://img.shields.io/pypi/pyversions/mawiisurv.svg)](https://pypi.org/project/mawiisurv/)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)

MAWII-Surv (MAny Weak and Invalid Instruments for Survival outcomes) implements **GEL-NOW**: Generalized Empirical Likelihood with **N**on-**O**rthogonal and **W**eak moments. It extends classical GEL to non-orthogonal nuisance settings and allows many weak or invalid IVs under right-censoring.

---

## Table of Contents
- [Introduction to MAWII-Surv](#introduction-to-mawii-surv)
- [Why MAWII-Surv?](#why-mawii-surv)
- [Features](#features)
- [Installation](#installation)
- [Dependencies](#dependencies)
- [Quick Start](#quick-start)
- [API](#api)
- [Arguments](#arguments)
- [Return Values](#return-values)
- [Notes & Tips](#notes)
- [Citation](#citation)
- [Contributing](#contributing)
- [License](#license)

---

## Introduction to MAWII-Surv

MAWII-Surv (MAny Weak and Invalid Instruments for Survival outcomes) is a Python package for semiparametric causal inference with right-censored outcomes in the presence of many weak or invalid instruments. The package implements the novel GEL-NOW (Generalized Empirical Likelihood with Non-Orthogonal and Weak moments, or GEL 2.0) framework, which extends classical generalized empirical likelihood to settings where nuisance functions enter non-orthogonally and where instruments may be weak or invalid.

Key features include:

- Heteroscedasticity-based identification under an accelerated failure time (AFT) model, enabling causal effect estimation even with invalid instruments.

- Flexible nuisance estimation using modern machine learning methods, including deep neural networks, to capture complex nonlinear structures.

- Robust inference that explicitly accounts for additional variance from non-orthogonal nuisances, ensuring valid confidence intervals.

- Diagnostics such as a censoring-adjusted over-identification test to assess instrument validity.

- Applications to biobank-scale data, with built-in support for analyzing time-to-event outcomes such as disease onset.


With simulation tools, diagnostic functions, and real-data examples, MAWII-Surv provides a user-friendly platform for researchers in statistics, econometrics, epidemiology, and genetics to conduct reliable causal inference from censored survival data

## Why MAWII-Surv?

- **Survival + Endogeneity:** G-estimation for treatment effects with right-censoring and unmeasured confounding.
- **Many Weak/Invalid IVs:** Robust to weak instruments and horizontal pleiotropy.
- **GEL 2.0 (GEL-NOW):** Empirical Likelihood (EL), Exponential Tilting (ET), and Continuous Updating (CUE) with theory for **non-orthogonal nuisances**.
- **Modern ML Nuisances:** Deep neural nets (PyTorch), Random Forests, XGBoost, plus classical linear models.
- **Diagnostics:** Over-identification test adapted to censoring; standard errors account for censoring-induced variance inflation.

## Features

- **Uncensor‐data** (`mawii_noncensor`)  
- **Right‐censoring** (`mawii_censor`)  
- Multiple model backends:
  - Neural networks
  - Linear regression
  - Random forests
  - XGBoost  
- Choice of Generalized Empirical Likelihood (GEL) functions:
  - Empirical Tilting (ET)
  - Empirical Likelihood (EL)
  - Continuous Updating Estimator (CUE)

---

## Installation

Install from PyPI:

```
pip install mawiisurv
```

---
## Dependencies

Make sure you have the following installed (the minimal compatible versions shown):
- numpy>=1.19
- torch>=1.8
- scipy>=1.5
- scikit-learn>=0.24
- xgboost>=1.3
- numba>=0.53

If you plan to use GPU, please install a CUDA-compatible PyTorch from the official download page before installing this package.

---
## Quick start

A runnable demo is provided below. It simulates both non-censored and right-censored data, fits the DNN + ET specification, and prints the point estimate, standard error, and the over-identification test statistic.
```
# demo
# pip install mawiisurv

import numpy as np
import torch
import mawiisurv

# Two main entry points:
#   mawii_noncensor(X, Z, A, Y, ...)
#   mawii_censor(X, Z, A, Y, censor_delta, ...)

# Inputs
#   X: (n, p) covariates
#   Z: (n, m) instrumental variables
#   A: (n,) treatment
#   Y: (n,) outcome
#   censor_delta: (n,) censoring indicator, 1 uncensored, 0 censored
#
# Model choices
#   model_types: ['neural_network','linear_regression','random_forest','xgboost']
#   rho_function_names: ['ET','EL','CUE']
#
# DNN hyperparameters (optional)
#   hidden_layers=[50, 50]
#   learning_rate=5e-4
#   weight_decay=1e-4
#   batch_size=256
#   dropout_rate=0
#   patience=5
#   epochs=1000
#   validation_split=0.05
#   shuffle=False
#   device='cpu' or 'cuda:0'

# ---------- simulate complete data ----------
n = 10000
m = 20
p = 1
beta_0 = 0.4
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'

X = np.random.uniform(0, 1, size=(n, p))
p_Z = [0.25, 0.5, 0.25]
Z = np.random.choice([0, 1, 2], size=(n, m), p=p_Z)

gamma = np.sqrt(0.2 / (1.5*m)) * np.random.normal(0, 1, size=m)
delta = np.sqrt(0.2 / (1.5*m)) * np.random.normal(0, 1, size=m)

epsilon_A = np.random.normal(0, 0.4, size=n)
epsilon_Y = np.random.normal(0, 0.4, size=n)
U = np.random.normal(0, 0.6, size=n)

A = Z @ gamma + U + (1 + Z @ delta) * epsilon_A
Y = beta_0 * A + np.sum(X, axis=1) - U + epsilon_Y

result_noncensor = mawiisurv.mawii_noncensor(
    X, Z, A, Y,
    model_types=['neural_network'],          # options: ['neural_network','linear_regression','random_forest','xgboost']
    rho_function_names=['ET'],               # options: ['ET','EL','CUE']
    device=device
)

print(f"DNN+ET BETA: {result_noncensor['neural_network']['ET']['beta']:.3f}")
print(f"DNN+ET SE: {result_noncensor['neural_network']['ET']['se']:.3f}")
print(f"DNN+ET over-identification test: {result_noncensor['neural_network']['ET']['test']:.3f}")

# ---------- simulate right-censored data ----------
T = beta_0 * A + np.sum(X, axis=1) - U + epsilon_Y
censor_rate = 0.4
rr = 0.0

while True:
    C = np.random.uniform(0 + rr, 5 + rr, size=n)
    censor_delta = (T <= C).astype(int)
    cr = np.mean(1 - censor_delta)
    if cr >= censor_rate + 0.03:
        rr += 0.1
    elif cr <= censor_rate - 0.03:
        rr -= 0.1
    else:
        break

Y = np.minimum(T, C)

result_censor = mawiisurv.mawii_censor(
    X, Z, A, Y, censor_delta, h=1,
    model_types=['neural_network'],
    rho_function_names=['ET'],
    device=device
)

print(f"DNN+ET BETA: {result_censor['neural_network']['ET']['beta']:.3f}")
print(f"DNN+ET SE: {result_censor['neural_network']['ET']['se']:.3f}")
print(f"DNN+ET over-identification test: {result_censor['neural_network']['ET']['test']:.3f}")


```
---
## API
```
mawii_noncensor(
    X, Z, A, Y,
    model_types=['neural_network'],
    rho_function_names=['ET'],
    hidden_layers=[50, 50],
    learning_rate=0.0005,
    weight_decay=0.0001,
    batch_size=256,
    dropout_rate=0,
    patience=5,
    epochs=100,
    validation_split=0.05,
    shuffle=False,
    device='cpu',
) -> dict
mawii_censor(
    X, Z, A, Y, censor_delta, h=1,
    model_types=['neural_network'],
    rho_function_names=['ET'],
    hidden_layers=[50, 50],
    learning_rate=0.0005,
    weight_decay=0.0001,
    batch_size=256,
    dropout_rate=0,
    patience=5,
    epochs=100,
    validation_split=0.05,
    shuffle=False,
    device='cpu',
) -> dict
```
---
## Arguments

- X: array of shape n by p, baseline covariates

- Z: array of shape n by m, instrumental variables

- A: array of shape n, treatment

- Y: array of shape n, outcome

- censor_delta: array of shape n, 1 uncensored and 0 censored, only for mawii_censor

- h: scalar, window for local Kaplan–Meier in censoring adjustment

- model_types: list of model backends, choose from neural_network, linear_regression, random_forest, xgboost

- rho_function_names: list of GEL score types, choose from ET, EL, CUE

- device: cpu or cuda device string such as cuda:0

---
## Return values:
```
{
  'neural_network': {
    'ET': {
      'beta': float,    # point estimate
      'se': float,      # standard error
      'test': float     # overidentification test statistic
    },
    'EL': {...},
    'CUE': {...}
  },
  'linear_regression': {...},
  ...
}
```

## Notes

- We generally recommend ET or EL over CUE under weak identification.

- Deep NNs tend to be most robust for complex nonlinear nuisances; RF/XGB are strong baselines in moderate dimensions.

- Under censoring, standard errors include an extra variance component due to estimating the censoring distribution.

## Citation:

If you use MAWII-Surv, please cite:
```
Bu Q., Su W., Zhao X., Liu Z. (2025).
Semiparametric Causal Inference for Right-Censored Outcomes with Many Weak Invalid Instruments.
(manuscript)
```

BibTeX:
```
@misc{mawiisurv2025,
  title   = {Semiparametric Causal Inference for Right-Censored Outcomes with Many Weak Invalid Instruments},
  author  = {Bu, Qiushi and Su, Wen and Zhao, Xingqiu and Liu, Zhonghua},
  year    = {2025},
  note    = {Python package: MAWII-Surv},
  howpublished = {\url{https://pypi.org/project/mawiisurv/}}
}
```

## Contributing

Contributions are welcome!  
Please use Issues for bug reports and pull requests for code contributions. 

## License
MIT License — see LICENSE for details.

