Metadata-Version: 2.1
Name: gadjid
Version: 0.0.1
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Requires-Dist: numpy
Requires-Dist: pytest ; extra == 'test'
Requires-Dist: scipy ; extra == 'test'
Provides-Extra: test
License-File: LICENSE
Summary: Adjustment Identification Distance: A 𝚐𝚊𝚍𝚓𝚒𝚍 for Causal Structure Learning
Keywords: causality,causal structure learning,graph distance,adjustment
Author: Theo Würtzen, Sebastian Weichwald, Leonard Henckel
License: MPL-2.0
Requires-Python: >=3.8
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Bug Tracker, https://github.com/CausalDisco/gadjid/issues
Project-URL: Source Code, https://github.com/CausalDisco/gadjid
Project-URL: arXiv, https://doi.org/10.48550/arXiv.2402.08616

# Adjustment Identification Distance: A 𝚐𝚊𝚍𝚓𝚒𝚍 for Causal Structure Learning

This is an early release of 𝚐𝚊𝚍𝚓𝚒𝚍 🐥 and feedback is very welcome!
Just [open an issue](https://github.com/CausalDisco/gadjid/issues/new/choose) on our github repository.

If you publish research using 𝚐𝚊𝚍𝚓𝚒𝚍, please cite
[our article](https://doi.org/10.48550/arXiv.2402.08616)
```bibtex
@article{henckel2024adjustment,
    title = {{Adjustment Identification Distance: A gadjid for Causal Structure Learning}},
    author = {Leonard Henckel and Theo Würtzen and Sebastian Weichwald},
    journal = {{arXiv preprint arXiv:2402.08616}},
    year = {2024},
    doi = {10.48550/arXiv.2402.08616},
}
```


## Get Started Real Quick 🚀 – Introductory Example

Just `pip install gadjid` to install the latest release of 𝚐𝚊𝚍𝚓𝚒𝚍 \
and run `python -c "import gadjid; help(gadjid)"` to get started
(or see [install alternatives](https://github.com/CausalDisco/gadjid#installation--python)).

```python
import gadjid
from gadjid import example, ancestor_aid, oset_aid, parent_aid, shd
import numpy as np

help(gadjid)

example.run_parent_aid()

Gtrue = np.array([
    [0, 1, 1, 1, 1],
    [0, 0, 1, 1, 1],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0]
], dtype=np.int8)
Gguess = np.array([
    [0, 0, 1, 1, 1],
    [1, 0, 1, 1, 1],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0]
], dtype=np.int8)

print(ancestor_aid(Gtrue, Gguess))
print(shd(Gtrue, Gguess))
```


---


𝚐𝚊𝚍𝚓𝚒𝚍 is implemented in Rust
and can conveniently be called from Python via our Python wrapper
(implemented using [maturin](https://www.maturin.rs/) and [PyO3](https://pyo3.rs/)).

> Evaluating graphs learned by causal discovery algorithms is difficult: The number of edges that differ between two graphs does not reflect how the graphs differ with respect to the identifying formulas they suggest for causal effects. We introduce a framework for developing causal distances between graphs which includes the
structural intervention distance for directed acyclic graphs as a special case. We use this framework to develop improved adjustment-based distances as well as extensions to completed partially directed acyclic graphs and causal orders. We develop polynomial-time reachability algorithms to compute the distances efficiently. In our package 𝚐𝚊𝚍𝚓𝚒𝚍, we provide implementations of our distances; they are orders of magnitude faster than the structural intervention distance and thereby provide a success metric for causal discovery that scales to graph sizes that were previously prohibitive.


## Implemented Distances

* `ancestor_aid(Gtrue, Gguess)`
* `oset_aid(Gtrue, Gguess)`
* `parent_aid(Gtrue, Gguess)`
* for convenience, the following distances are implemented, too
    * `shd(Gtrue, Gguess)`
    * `sid(Gtrue, Gguess)` – only for DAGs!

where Gtrue and Gguess are adjacency matrices of a DAG or CPDAG.
The functions are not symmetric in their input:
To calculate a distance,
identifying formulas for causal effects are inferred in the graph `Gguess`
and verified against the graph `Gtrue`.
Distances return a tuple `(normalised_distance, mistake_count)`
of the fraction of causal effects inferred in Gguess that are wrong relative to Gtrue, `normalised_distance`,
and the number of wrongly inferred causal effects, `mistake_count`.
There are $p(p-1)$ pairwise causal effects to infer in graphs with $p$ nodes
and we define normalisation as  `normalised_distance = mistake_count / p(p-1)`.

All graphs are assumed simple, that is, at most one edge is allowed between any two nodes.
An adjacency matrix for a DAG may only contain 0s and 1s;
a `1` in row `s` and column `t` codes a directed edge `Xₛ → Xₜ`;
DAG inputs are validated for acyclicity.
An adjacency matrix for a CPDAG may only contain 0s, 1s and 2s;
a `2` in row `s` and column `t` codes a undirected edge `Xₛ — Xₜ`
(an additional `2` in row `t` and column `s` is ignored; only one of the two entries is required to code an undirected edge);
CPDAG inputs are not validated and __the user needs to ensure the adjacency matrix indeed codes a valid CPDAG (instead of just a PDAG)__.
You may also calculate the SID between DAGs via `parent_aid(DAGtrue, DAGguess)`,
but we recommend `ancestor_aid` and `oset_aid` and for CPDAG inputs our `parent_aid` does not coincide with the SID
(see also our accompanying article).


## Empirical Runtime Analysis

Experiments run on a laptop with 8 GB RAM and 4-core i5-8365U processor.
Here, for a graph with $p$ nodes,
sparse graphs have $10p$ edges in expectation,
dense graphs have $0.3p(p-1)/2$ edges in expectation,
and
sparse graphs have $0.75p$ edges in expectation.

__Maximum graph size feasible within 1 minute__

| Method       | sparse | dense |
|--------------|-------:|------:|
| Parent-AID   |  13005 |   960 |
| Ancestor-AID |   8200 |   932 |
| Oset-AID     |    546 |   250 |
| SID in R     |    255 |   239 |

__Average runtime__
| Method       | x-sparse ($p=1000$) | sparse ($p=256$) | dense ($p=239$) |
|--------------|--------------------:|-----------------:|----------------:|
| Parent-AID   |              6.3 ms |          22.8 ms |          189 ms |
| Ancestor-AID |              2.7 ms |          38.7 ms |          226 ms |
| Oset-AID     |              3.2 ms |          4.69 s  |         47.3 s  |
| SID in R     |             ~1–2 h  |           ~60 s  |          ~60 s  |


## LICENSE

𝚐𝚊𝚍𝚓𝚒𝚍 is available in source code form at <https://github.com/CausalDisco/gadjid>.

This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0.
If a copy of the MPL was not distributed with this file, You can obtain one at https://mozilla.org/MPL/2.0/.

See also the [MPL-2.0 FAQ](https://mozilla.org/MPL/2.0/FAQ).

