Metadata-Version: 2.4
Name: sciforge
Version: 0.1.1
Summary: The unified scientific research toolkit — reproducibility, lineage, units, literature graphs, hypothesis tracking, and bio-aware validation in one package.
Author-email: Your Name <you@example.com>
License: MIT
License-File: LICENSE
Requires-Python: >=3.10
Requires-Dist: jinja2>=3.1
Requires-Dist: networkx>=3.0
Requires-Dist: numpy>=1.24
Requires-Dist: pandas>=2.0
Requires-Dist: pint>=0.23
Requires-Dist: pyarrow>=14.0
Requires-Dist: rich>=13.0
Requires-Dist: scikit-learn>=1.3
Requires-Dist: scipy>=1.11
Provides-Extra: dev
Requires-Dist: mypy; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs; extra == 'docs'
Requires-Dist: mkdocs-material; extra == 'docs'
Description-Content-Type: text/markdown

# sciforge

**The unified scientific research toolkit** — six modules, one coherent package that addresses the deepest pain points in reproducible, rigorous science.

```
pip install sciforge
```

---

## Why sciforge?

| Problem | Module |
|---------|--------|
| "My experiment isn't reproducible" | `repro` — scores your codebase, pinpoints the exact issues |
| "I can't track what I believed and why" | `hypotest` — version-controlled hypothesis ledger |
| "A unit mismatch crashed my analysis" | `unitflow` — physical units propagate through NumPy/Pandas |
| "My CV scores are inflated (same patient in train+test)" | `crossvalbio` — patient/batch/phylo/temporal-aware splits |
| "My figure is stale but I don't know which one" | `papertrail` — data→figure→claim lineage with staleness detection |
| "I can't see which papers my work builds on" | `litmap` — semantic citation dependency graphs |
| "I need a professional audit for submission" | `audit` — generates submission-ready Research Audit Trails (RAT) |
| "I need to share my work with reviewers" | `bundle` — packages everything into a reproducible ZIP archive |

All modules share a common provenance store and work together seamlessly.

---

## Quick start

```python
import sciforge as sf

# 1 — Score reproducibility of a project
report = sf.repro.score("my_experiment/")
print(report.summary())
# Score: 78/100 (B — Good)
# Findings: 2 high, 1 medium
#   [HIGH] SEEDS  train.py:12 — stochastic ops but no seed set
#   [HIGH] HARDPATHS  preprocess.py:8 — /home/alice/data/train.csv

# 2 — Declare and test a hypothesis
h = sf.hypotest.declare("Drug X reduces IL-6 at 10mg/kg (in vivo)")
h.link_experiment("runs/exp_042")
h.test(p_value=0.028, effect_size=0.52, n=48,
       test_name="one-tailed t-test",
       notes="vs vehicle control")
print(h.status)   # SUPPORTED

# 3 — Physical units in pipelines
force = sf.unitflow.Quantity([9.8, 12.1], "N")
mass  = sf.unitflow.Quantity([1.0,  1.5], "kg")
accel = force / mass      # Quantity([9.8, 8.07], 'm / s²')

# Add unit contracts to functions
@sf.unitflow.requires_units(distance="m", time="s")
def speed(distance, time):
    return distance / time

# 4 — Biologically-aware cross-validation
from sciforge.crossvalbio import PatientSplit

for train, test in PatientSplit(n_splits=5, random_state=42).split(X, y, groups=patient_ids):
    model.fit(X[train], y[train])
    # Guaranteed: no patient in both train and test

# 5 — Data-to-claim lineage
with sf.papertrail.session("results/paper_v3") as trail:
    df    = trail.load("data/cleaned.csv")
    df2   = trail.transform(df, remove_outliers, "remove_outliers")
    fig   = trail.figure("fig2_roc", df2)
    trail.claim("AUC > 0.90 on held-out test set", fig)

# Later, after data changes:
report = sf.papertrail.check_stale("results/paper_v3")
print(report.summary())
# STALE: fig2_roc — data changed since last hash
# STALE: AUC > 0.90 on held-out test set — upstream data changed

# 6 — Literature dependency graph
graph = sf.litmap.build("papers.bib")
graph.show_dependencies("Attention Is All You Need", depth=2)
central = graph.most_central(n=5)
clusters = graph.cluster_by_keywords()
```

---

## Module reference

### `sciforge.repro`
```python
report = repro.score("path/to/project/")    # or .py / .ipynb file
report.score           # int 0-100
report.findings        # list[Finding]
report.critical_findings()
report.summary()       # human-readable string
report.to_dict()       # JSON-serialisable
```
**Checks:** `SEEDS`, `ENVLOCK`, `HARDPATHS`, `DATETIME`, `NOTEBOOK_ORDER`, `FLOATPREC`, `SECRETS`, `LEAKAGE`

---

### `sciforge.audit`
Generate a full Research Audit Trail (RAT) report for peer review:
```python
from sciforge import audit

report = audit.generate_report()
print(report)
```
Or via the CLI:
```bash
sciforge audit
```
Includes reproducibility scores, hypothesis outcomes, data lineage, hardware snapshots, and carbon footprint estimation.

---

### `sciforge.hypotest`
```python
h = hypotest.declare("statement", alpha=0.05, tags=["bio"])
h.revise("new statement", reason="new data")
h.link_experiment("runs/exp_01")
h.test(p_value=0.02, effect_size=0.6, n=120, test_name="t-test")
h.retract("error found")
h.summary()

ledger = hypotest.ledger()
ledger.supported()        # list[Hypothesis]
ledger.pending()
ledger.by_tag("bio")
```

---

### `sciforge.unitflow`
```python
q = unitflow.Quantity(array, "N")
q + q2     # unit-checked addition
q / q2     # auto unit derivation (e.g. N/kg → m/s²)
q.to("kN") # unit conversion (requires pint)

uf = unitflow.UnitFrame(df, units={"col": "m"})
uf.derived("speed", lambda df: df["dist"] / df["time"], "m/s")
uf.to_quantity("speed")

@unitflow.requires_units(force="N", mass="kg")
def acceleration(force, mass): ...
```

---

### `sciforge.crossvalbio`
| Splitter | Use case |
|----------|----------|
| `PatientSplit` | Clinical / longitudinal data |
| `BatchAwareSplit` | Multi-batch experiments |
| `PhyloSplit` | Species / microbiome data |
| `TemporalBioSplit` | Time-series studies |
| `ComboSplit` | Multiple constraints combined |

All are `sklearn`-compatible (implement `split(X, y, groups)`).

---

### `sciforge.papertrail`
```python
with papertrail.session("project/paper_v2") as trail:
    df    = trail.load("data.csv")
    df2   = trail.transform(df, fn, "step_name")
    fig   = trail.figure("fig1", df2)
    claim = trail.claim("Model X > baseline", fig)

report = papertrail.check_stale("project/paper_v2")
report.is_clean           # bool
report.stale              # list[TrailNode]
report.summary()

trail = papertrail.open_trail("project/paper_v2")
trail.lineage_of("fig1")  # all ancestor nodes
trail.claims()
trail.figures()
```

---

### `sciforge.litmap`
```python
graph = litmap.build("papers.bib")           # BibTeX file or dir
graph = litmap.build(["10.xxxx/doi"])        # DOI list (uses Crossref API)

graph.find("Attention")                      # title search
graph.dependencies("paper_id", depth=2)     # papers it cites
graph.dependents("paper_id")                # papers that cite it
graph.most_central(n=10)                    # by betweenness centrality
graph.roots()                               # uncited papers
graph.leaves()                              # papers not cited by others
graph.shortest_path("a", "b")              # citation path
graph.cluster_by_keywords()
graph.show_dependencies("Title fragment")
```

---

## Requirements

- Python ≥ 3.10
- `numpy`, `pandas`, `networkx`, `scipy`, `scikit-learn`, `rich`, `jinja2`
- Optional: `pint` (for full unit conversion in `unitflow`)

---

## Running tests

```bash
pip install -e ".[dev]"
pytest tests/ -v
```

---

## Contributing

Pull requests welcome!  See `CONTRIBUTING.md`.

---

## License

MIT
