Metadata-Version: 2.4
Name: statika
Version: 1.0.1
Summary: Open-source statistical analysis tool — a Python alternative to Stata, SPSS, and SAS
Project-URL: Homepage, https://github.com/baristiran/statika
Project-URL: Documentation, https://baristiran.github.io/statika/
Project-URL: Bug Tracker, https://github.com/baristiran/statika/issues
Project-URL: Changelog, https://github.com/baristiran/statika/blob/main/CHANGELOG.md
Author-email: baristiran <baristiran@users.noreply.github.com>
License-Expression: MIT
License-File: LICENSE
Keywords: causal-inference,cli,data-analysis,econometrics,machine-learning,panel-data,regression,repl,spss,stata,statistics,survival-analysis,time-series
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: matplotlib<4,>=3.8
Requires-Dist: numpy<3,>=1.24
Requires-Dist: polars<2,>=1.0
Requires-Dist: prompt-toolkit<4,>=3.0
Requires-Dist: rich<14,>=13.0
Requires-Dist: scipy<2,>=1.12
Requires-Dist: statsmodels<1,>=0.14
Requires-Dist: typer<1,>=0.12
Provides-Extra: all
Requires-Dist: arch>=6.0; extra == 'all'
Requires-Dist: connectorx>=0.3; extra == 'all'
Requires-Dist: duckdb>=0.10; extra == 'all'
Requires-Dist: fastapi>=0.100; extra == 'all'
Requires-Dist: ipython>=8.0; extra == 'all'
Requires-Dist: libpysal>=4.7; extra == 'all'
Requires-Dist: lifelines>=0.28; extra == 'all'
Requires-Dist: lightgbm>=4.0; extra == 'all'
Requires-Dist: lime>=0.2; extra == 'all'
Requires-Dist: linearmodels>=6.0; extra == 'all'
Requires-Dist: nbformat>=5.0; extra == 'all'
Requires-Dist: networkx>=3.0; extra == 'all'
Requires-Dist: openpyxl>=3.1; extra == 'all'
Requires-Dist: pandas>=2.0; extra == 'all'
Requires-Dist: plotly>=5.0; extra == 'all'
Requires-Dist: pmdarima>=2.0; extra == 'all'
Requires-Dist: prophet>=1.1; extra == 'all'
Requires-Dist: pyreadstat>=1.0; extra == 'all'
Requires-Dist: python-docx>=1.1; extra == 'all'
Requires-Dist: python-multipart; extra == 'all'
Requires-Dist: python-pptx>=0.6; extra == 'all'
Requires-Dist: rapidfuzz>=3.0; extra == 'all'
Requires-Dist: reportlab>=4.0; extra == 'all'
Requires-Dist: scikit-learn>=1.4; extra == 'all'
Requires-Dist: semopy>=2.3; extra == 'all'
Requires-Dist: shap>=0.44; extra == 'all'
Requires-Dist: spreg>=1.3; extra == 'all'
Requires-Dist: textual>=0.60; extra == 'all'
Requires-Dist: uvicorn>=0.30; extra == 'all'
Requires-Dist: websockets; extra == 'all'
Requires-Dist: xgboost>=2.0; extra == 'all'
Requires-Dist: xlsxwriter>=3.1; extra == 'all'
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.20; extra == 'anthropic'
Provides-Extra: database
Requires-Dist: connectorx>=0.3; extra == 'database'
Provides-Extra: dev
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Provides-Extra: duckdb
Requires-Dist: duckdb>=0.10; extra == 'duckdb'
Provides-Extra: excel
Requires-Dist: openpyxl>=3.1; extra == 'excel'
Requires-Dist: xlsxwriter>=3.1; extra == 'excel'
Provides-Extra: factor
Requires-Dist: scikit-learn>=1.4; extra == 'factor'
Provides-Extra: fuzzy
Requires-Dist: rapidfuzz>=3.0; extra == 'fuzzy'
Provides-Extra: garch
Requires-Dist: arch>=6.0; extra == 'garch'
Provides-Extra: interactive
Requires-Dist: plotly>=5.0; extra == 'interactive'
Provides-Extra: jupyter
Requires-Dist: ipython>=8.0; extra == 'jupyter'
Provides-Extra: lightgbm
Requires-Dist: lightgbm>=4.0; extra == 'lightgbm'
Provides-Extra: lime
Requires-Dist: lime>=0.2; extra == 'lime'
Provides-Extra: ml
Requires-Dist: scikit-learn>=1.4; extra == 'ml'
Provides-Extra: network
Requires-Dist: networkx>=3.0; extra == 'network'
Provides-Extra: notebook
Requires-Dist: nbformat>=5.0; extra == 'notebook'
Provides-Extra: openai
Requires-Dist: openai>=1.0; extra == 'openai'
Provides-Extra: panel
Requires-Dist: linearmodels>=6.0; extra == 'panel'
Provides-Extra: pdf
Requires-Dist: reportlab>=4.0; extra == 'pdf'
Provides-Extra: prophet
Requires-Dist: prophet>=1.1; extra == 'prophet'
Provides-Extra: rbridge
Requires-Dist: rpy2>=3.5; extra == 'rbridge'
Provides-Extra: report
Requires-Dist: python-docx>=1.1; extra == 'report'
Requires-Dist: python-pptx>=0.6; extra == 'report'
Provides-Extra: sas
Requires-Dist: pandas>=2.0; extra == 'sas'
Requires-Dist: pyreadstat>=1.0; extra == 'sas'
Provides-Extra: sem
Requires-Dist: semopy>=2.3; extra == 'sem'
Provides-Extra: shap
Requires-Dist: shap>=0.44; extra == 'shap'
Provides-Extra: spatial
Requires-Dist: libpysal>=4.7; extra == 'spatial'
Requires-Dist: spreg>=1.3; extra == 'spatial'
Provides-Extra: spss
Requires-Dist: pandas>=2.0; extra == 'spss'
Requires-Dist: pyreadstat>=1.0; extra == 'spss'
Provides-Extra: stata
Requires-Dist: pandas>=2.0; extra == 'stata'
Requires-Dist: pyreadstat>=1.0; extra == 'stata'
Provides-Extra: survival
Requires-Dist: lifelines>=0.28; extra == 'survival'
Provides-Extra: timeseries
Requires-Dist: pmdarima>=2.0; extra == 'timeseries'
Provides-Extra: tui
Requires-Dist: textual>=0.60; extra == 'tui'
Provides-Extra: web
Requires-Dist: fastapi>=0.100; extra == 'web'
Requires-Dist: python-multipart; extra == 'web'
Requires-Dist: uvicorn>=0.30; extra == 'web'
Requires-Dist: websockets; extra == 'web'
Provides-Extra: xgboost
Requires-Dist: xgboost>=2.0; extra == 'xgboost'
Description-Content-Type: text/markdown

<p align="center">
  <a href="https://pypi.org/project/statika/"><img src="https://img.shields.io/pypi/v/statika?style=for-the-badge&color=blue" alt="PyPI Version"></a>
  <a href="https://pypi.org/project/statika/"><img src="https://img.shields.io/pypi/dm/statika?style=for-the-badge&color=brightgreen" alt="PyPI Downloads"></a>
  <img src="https://img.shields.io/badge/python-3.10%2B-brightgreen?style=for-the-badge&logo=python&logoColor=white" alt="Python">
  <img src="https://img.shields.io/badge/license-MIT-orange?style=for-the-badge" alt="License">
  <img src="https://img.shields.io/badge/surface-stable%20core%20%2B%20experimental-lightgrey?style=for-the-badge" alt="Surface">
  <img src="https://img.shields.io/badge/powered%20by-Polars%20%7C%20statsmodels-purple?style=for-the-badge" alt="Stack">
</p>

<h1 align="center">statika</h1>

<p align="center">
  <strong>Run statistical analyses in seconds — no license fees, no GUI required.</strong><br>
  OLS, logit, survival analysis, panel data, and 260+ more commands, right from your terminal.
</p>

<p align="center">
  <a href="#installation">Install</a> &bull;
  <a href="#30-second-demo">Demo</a> &bull;
  <a href="#why-statika">Why statika?</a> &bull;
  <a href="#command-reference">Commands</a> &bull;
  <a href="#quick-examples">Examples</a> &bull;
  <a href="#contributing">Contributing</a>
</p>

---

> **Note:** statika is an independent, community-driven open-source project. It is not affiliated with, endorsed by, or connected to StataCorp LLC or any commercial statistical software vendor.

---

## Installation

```bash
pip install statika
statika repl
```

That is all. No virtual environment required, no license server, no installer wizard.

### Optional extras

```bash
pip install "statika[excel]"    # Excel (.xlsx) import/export
pip install "statika[stata]"    # Stata .dta import/export
pip install "statika[survival]" # Survival analysis (lifelines)
pip install "statika[all]"      # Everything above
```

---

## 30-Second Demo

```
$ statika repl
statika v1.0.0 — Open-source statistical analysis tool
Type help for commands, quit to exit.

statika> load examples/data.csv
Loaded 50 rows x 7 columns from examples/data.csv

statika> summarize age income score
┌──────────┬────┬─────────┬─────────┬───────┬─────────┬─────────┬─────────┬─────────┐
│ Variable │ N  │ Mean    │ SD      │ Min   │ P25     │ P50     │ P75     │ Max     │
├──────────┼────┼─────────┼─────────┼───────┼─────────┼─────────┼─────────┼─────────┤
│ age      │ 50 │ 34.6600 │  8.7634 │ 21.00 │ 27.2500 │ 34.0000 │ 42.5000 │ 53.0000 │
│ income   │ 50 │ 49840.0 │ 17547.2 │ 26000 │ 34000.0 │ 47000.0 │ 66000.0 │ 88000.0 │
│ score    │ 50 │  7.4280 │  1.2844 │  4.90 │  6.4750 │  7.5000 │  8.5500 │  9.4000 │
└──────────┴────┴─────────┴─────────┴───────┴─────────┴─────────┴─────────┴─────────┘

statika> ols score ~ age + income --robust
┌──────────┬────────┬─────────┬───────┬────────┬────────────┬─────────────┐
│ Variable │ Coef   │ Std.Err │ t/z   │ P>|t|  │ [95% CI L] │ [95% CI H]  │
├──────────┼────────┼─────────┼───────┼────────┼────────────┼─────────────┤
│ _cons    │ 2.1435 │ 0.4521  │ 4.741 │ 0.0000 │ 1.2343     │ 3.0527      │
│ age      │ 0.0312 │ 0.0187  │ 1.668 │ 0.1018 │ -0.0066    │ 0.0690      │
│ income   │ 0.0001 │ 0.0000  │ 5.234 │ 0.0000 │ 0.0000     │ 0.0001      │
└──────────┴────────┴─────────┴───────┴────────┴────────────┴─────────────┘
N = 50  |  R² = 0.5481  |  Adj.R² = 0.5289  |  F(2, 47) = 28.52 (p=0.0000)

statika> margins
Average marginal effects computed.

statika> estimates table
Model comparison table generated.

statika> quit
Bye!
```

### Run a script instead

```bash
statika run analysis.ost           # Run an .ost script
statika run analysis.ost --strict  # Stop on first error (useful in CI)
```

---

## Why statika?

| Feature | Stata | R | SPSS | **statika** |
|---------|:-----:|:-:|:----:|:------------:|
| Price | $595/yr | Free | $99/mo | **Free** |
| Familiar CLI syntax | Yes | No | No | **Yes** |
| Scripting | Yes | Yes | No | **Yes** |
| Python ecosystem | No | No | No | **Yes** |
| No eval / safe DSL | — | — | — | **Yes** |
| Interactive REPL | No | Partial | No | **Yes** |
| Polars backend | No | No | No | **Yes** |

statika is designed for researchers and data scientists who want the muscle memory of a CLI workflow without paying for it, and who want scripted, reproducible analyses that fit into version-controlled projects.

---

## Stable vs Experimental

statika distinguishes between a **stable core** and **experimental** modules:

- **Stable core:** data loading, transformation, descriptive statistics, core regression models, hypothesis tests, plotting, reporting, scripting.
- **Experimental:** panel data, survival analysis, survey-weighted estimation, SEM, network analysis, spatial statistics, and advanced ML commands.

Help and tab completion default to stable commands. To inspect the experimental surface:

```
statika> help --list --experimental
```

---

## Quick Examples

### 1. Basic data exploration

```
statika> load survey.csv
statika> describe
statika> summarize age income education
statika> tabulate region
statika> crosstab gender employed
statika> corr age income score
```

### 2. OLS regression with post-estimation

```
statika> load data.csv
statika> ols income ~ age + education + experience --robust
statika> predict yhat
statika> residuals resid
statika> vif
statika> estat all
statika> latex results/model.tex
```

### 3. Logit with marginal effects and model comparison

```
statika> logit employed ~ age + income + education
statika> margins
statika> margins --at=means
statika> ols employed ~ age + income + education
statika> estimates table
```

### 4. Grouped analysis and hypothesis tests

```
statika> groupby region summarize mean(income) sd(income) count()
statika> ttest income by employed
statika> anova score by region
statika> chi2 region employed
```

### 5. Scripted reproducible analysis (.ost file)

Create `analysis.ost`:

```bash
# analysis.ost — reproducible wage regression
load data/wages.csv
describe
summarize wage age education experience

derive log_wage = log(wage)
encode region as region_code

ols log_wage ~ age + education + experience --robust
predict yhat
residuals resid
vif
estat all
bootstrap n=1000 ci=95

latex outputs/wage_table.tex
report outputs/wage_report.md
save outputs/wages_modeled.parquet
```

Run it:

```bash
statika run analysis.ost --strict
```

---

## Command Reference

<details>
<summary><strong>Data Management</strong> (8 commands)</summary>

| Command | Description | Example |
|---------|-------------|---------|
| `load <path>` | Load CSV, Parquet, Stata (.dta), Excel (.xlsx) | `load survey.csv` |
| `save <path>` | Save data to any supported format | `save results.parquet` |
| `describe` | Show dataset structure (types, nulls) | `describe` |
| `head [N]` | Show first N rows (default: 10) | `head 20` |
| `tail [N]` | Show last N rows | `tail 5` |
| `count` | Row and column count | `count` |
| `merge <path> on <key> [how=...]` | Join with another file | `merge scores.csv on id how=left` |
| `undo` | Undo last data change (multi-level) | `undo` |

</details>

<details>
<summary><strong>Data Transformation</strong> (18 commands)</summary>

| Command | Description | Example |
|---------|-------------|---------|
| `filter <expr>` | Filter rows with expressions | `filter age > 30 and income < 50000` |
| `select <cols>` | Keep specific columns | `select age income score` |
| `derive <col> = <expr>` | Create new variables | `derive bmi = weight / (height ** 2)` |
| `dropna [cols]` | Drop missing values | `dropna age income` |
| `fillna <col> <strategy>` | Fill missing values | `fillna income median` |
| `sort <col> [--desc]` | Sort dataset | `sort income --desc` |
| `rename <old> <new>` | Rename a column | `rename income salary` |
| `cast <col> <type>` | Cast column type | `cast age float` |
| `encode <col> [as <new>]` | Label-encode strings | `encode region as region_code` |
| `recode <col> old=new ...` | Recode values | `recode region North=N South=S` |
| `replace <col> <old> <new>` | Replace values | `replace region North Norte` |
| `sample <N\|N%>` | Random sample | `sample 100` or `sample 10%` |
| `duplicates [drop] [cols]` | Find or drop duplicates | `duplicates drop` |
| `unique <col>` | List unique values | `unique region` |
| `lag <col> [N]` | Lag variable (shift down) | `lag price 2` |
| `lead <col> [N]` | Lead variable (shift up) | `lead price` |
| `pivot <val> by <col>` | Reshape to wide format | `pivot score by subject over name` |
| `melt <ids>, <vals>` | Reshape to long format | `melt name, math eng` |

</details>

<details>
<summary><strong>Descriptive Statistics</strong> (5 commands)</summary>

| Command | Description | Example |
|---------|-------------|---------|
| `summarize [cols]` | Summary statistics (N, Mean, SD, quartiles) | `summarize age income` |
| `tabulate <col>` | Frequency table (top 50 values) | `tabulate education` |
| `crosstab <row> <col>` | Two-way contingency table with row percentages | `crosstab gender status` |
| `corr [cols]` | Pearson correlation matrix | `corr age income score` |
| `groupby <cols> summarize <aggs>` | Group-by with aggregations | `groupby region summarize mean(income) count()` |

</details>

<details>
<summary><strong>Statistical Models</strong> (6 commands)</summary>

| Command | Description | Example |
|---------|-------------|---------|
| `ols y ~ x1 + x2` | OLS linear regression | `ols score ~ age + income --robust` |
| `logit y ~ x1 + x2` | Logistic regression (binary) | `logit employed ~ age + income` |
| `probit y ~ x1 + x2` | Probit regression (binary) | `probit employed ~ age + income` |
| `poisson y ~ x1 + x2` | Poisson regression (counts) | `poisson visits ~ age --exposure=time` |
| `negbin y ~ x1 + x2` | Negative Binomial (overdispersed) | `negbin claims ~ age + gender` |
| `quantreg y ~ x1 + x2` | Quantile regression | `quantreg wage ~ edu + exp tau=0.9` |

All models support:
- `--robust` — heteroscedasticity-robust standard errors (HC1)
- `--cluster=col` — cluster-robust standard errors
- `--weight=col` — frequency/analytic weights

**Formula syntax:**
- `y ~ x1 + x2` — standard predictors
- `y ~ x1*x2` — full factorial (expands to `x1 + x2 + x1:x2`)
- `y ~ x1:x2` — interaction term only
- `y ~ x1*x2*x3` — three-way interaction

</details>

<details>
<summary><strong>Post-Estimation</strong> (9 commands)</summary>

| Command | Description | Example |
|---------|-------------|---------|
| `predict [name]` | Predicted values from last model | `predict yhat` |
| `residuals [name]` | Residuals + diagnostic plots | `residuals resid` |
| `vif` | Variance Inflation Factor | `vif` |
| `margins [--at=means\|average]` | Marginal effects (dy/dx) | `margins --at=average` |
| `bootstrap [n=N] [ci=N]` | Bootstrap confidence intervals | `bootstrap n=1000 ci=95` |
| `estat <sub>` | Post-estimation diagnostics | `estat all` |
| `estimates table` | Side-by-side model comparison | `estimates table` |
| `stepwise y ~ x1 + ...` | Stepwise variable selection | `stepwise y ~ x1 + x2 --backward` |
| `latex [path.tex]` | Export model as LaTeX table | `latex results.tex` |

`estat` subcommands: `hettest`, `ovtest`, `linktest`, `ic`, `all`

</details>

<details>
<summary><strong>Hypothesis Tests</strong> (5 commands)</summary>

| Command | Description | Example |
|---------|-------------|---------|
| `ttest <col>` | One-sample t-test | `ttest score mu=7` |
| `ttest <col> by <group>` | Two-sample Welch t-test | `ttest income by employed` |
| `ttest <col> paired <col2>` | Paired t-test | `ttest before paired after` |
| `chi2 <col1> <col2>` | Chi-square independence test | `chi2 region employed` |
| `anova <col> by <group>` | One-way ANOVA (F-test) | `anova score by region` |

</details>

<details>
<summary><strong>Visualization</strong> (7 commands)</summary>

| Command | Description | Example |
|---------|-------------|---------|
| `plot hist <col>` | Histogram | `plot hist age` |
| `plot scatter <y> <x>` | Scatter plot | `plot scatter score income` |
| `plot line <y> <x>` | Line plot | `plot line score age` |
| `plot box <col> [by <g>]` | Box plot (optionally grouped) | `plot box income by region` |
| `plot bar <col> [by <g>]` | Bar chart | `plot bar income by region` |
| `plot heatmap [cols]` | Correlation heatmap | `plot heatmap age income score` |
| `plot diagnostics` | Residual diagnostic plots | `plot diagnostics` |

</details>

<details>
<summary><strong>Reporting and Utilities</strong> (4 commands)</summary>

| Command | Description | Example |
|---------|-------------|---------|
| `report <path>` | Generate Markdown report | `report analysis.md` |
| `help [cmd]` | Show help (all or specific command) | `help ols` |
| `esttab` | Publication-style coefficient table | `esttab` |
| `quit` / `exit` / `q` | Exit REPL | `quit` |

</details>

---

## Expression Language

The expression language used by `filter` and `derive` is a **safe, recursive-descent parser**. No Python `eval()` is used anywhere in statika.

```bash
# Arithmetic
statika> derive income_k = income / 1000
statika> derive bmi = weight / (height ** 2)

# Comparisons and boolean logic
statika> filter age > 30 and income < 50000
statika> filter not is_null(score) and region == "North"

# Functions
statika> derive log_income = log(income)
statika> derive name_upper = upper(name)
statika> derive score_clean = fill_null(score, 0)
```

| Category | Functions |
|----------|-----------|
| Math | `log(x)`, `log10(x)`, `sqrt(x)`, `abs(x)`, `exp(x)`, `round(x, n)` |
| String | `upper(x)`, `lower(x)`, `len_chars(x)`, `strip(x)`, `contains(x, "pat")` |
| Null | `is_null(x)`, `is_not_null(x)`, `fill_null(x, value)` |
| Type | `cast_float(x)`, `cast_int(x)`, `cast_str(x)` |

Aggregation functions for `groupby ... summarize`:

| Function | Description |
|----------|-------------|
| `mean(col)` | Arithmetic mean |
| `sd(col)` | Standard deviation (sample) |
| `sum(col)` | Sum |
| `min(col)` | Minimum |
| `max(col)` | Maximum |
| `median(col)` | Median |
| `count()` | Row count per group |

---

## Automatic Model Diagnostics

Every model automatically checks for common problems:

- **Multicollinearity** — Condition number > 30 triggers a warning
- **Heteroscedasticity** — Breusch-Pagan test; suggests `--robust` if p < 0.05
- **Autocorrelation** — Durbin-Watson statistic far from 2.0
- **Convergence** — Warns if logit/probit MLE did not converge
- **Missing values** — Reports how many observations were dropped
- **Low sample size** — Warns when the observation-to-predictor ratio is low

---

## File Formats

| Format | Import | Export | Notes |
|--------|:------:|:------:|-------|
| CSV | Yes | Yes | Built-in |
| Parquet | Yes | Yes | Built-in |
| Stata (.dta) | Yes | Yes | `pip install "statika[stata]"` |
| Excel (.xlsx) | Yes | Yes | `pip install "statika[excel]"` |

---

## CLI Reference

```bash
statika repl                     # Interactive REPL
statika run script.ost           # Run an .ost script
statika run script.ost --strict  # Stop on first error (exit code 1)
statika --verbose repl           # Verbose logging
statika --debug repl             # Debug logging
statika --version                # Show version
```

Logs are written to `~/.statika/logs/openstat.log`.

---

## Configuration

Create `~/.statika/config.toml` to customize defaults:

```toml
[data]
output_dir = "outputs"
csv_separator = ","

[display]
tabulate_limit = 50
head_default = 10

[undo]
max_undo_stack = 20
max_undo_memory_mb = 500

[plotting]
plot_dpi = 150
plot_figsize_w = 8.0
plot_figsize_h = 5.0

[model]
condition_threshold = 30
min_obs_per_predictor = 5
bootstrap_iterations = 1000
```

---

## Technology Stack

| Component | Library | Notes |
|-----------|---------|-------|
| Data engine | [Polars](https://pola.rs/) | Rust-powered, zero-copy, 10-100x faster than pandas |
| Statistics | [statsmodels](https://www.statsmodels.org/) | OLS, GLM, quantile regression |
| Scientific | [SciPy](https://scipy.org/) | Hypothesis tests, distributions |
| Plotting | [matplotlib](https://matplotlib.org/) | Publication-quality figures |
| CLI | [Typer](https://typer.tiangolo.com/) | Type-annotated CLI |
| Terminal UI | [Rich](https://github.com/Textualize/rich) | Tables and formatted output |
| REPL | [prompt-toolkit](https://python-prompt-toolkit.readthedocs.io/) | Tab completion, history |

---

## Contributing

Contributions are welcome. Whether you are fixing a typo, stabilizing an experimental module, or adding a new command, the process is the same:

1. Fork the repository on GitHub
2. Create a feature branch: `git checkout -b feature/your-feature`
3. Write code and tests
4. Confirm tests pass and lint is clean: `pytest` and `ruff check src/`
5. Open a pull request with a clear description

### What to contribute

- **Stable-core hardening** — CLI/REPL behavior, error handling, command metadata
- **Experimental stabilization** — panel, survival, survey, IV, mixed models
- **New commands** — any useful data manipulation or analysis command
- **Expression language** — new DSL functions
- **Plot types** — new visualization types
- **File formats** — SAS, SPSS, JSON, and others
- **Documentation** — tutorials, examples, translations
- **Bug reports** — open an issue on GitHub

New to open source? Look for issues labeled `good first issue`. See [CONTRIBUTING.md](CONTRIBUTING.md) for the full setup guide.

---

## Roadmap

### Completed

- [x] OLS, Logit, Probit, Poisson, Negative Binomial, Quantile regression
- [x] Interaction terms (`x1*x2`, `x1:x2`, three-way)
- [x] Robust and cluster-robust standard errors
- [x] Frequency and analytic weight support
- [x] Marginal effects (average, at-means, for OLS/logit/probit)
- [x] Bootstrap confidence intervals (parallelized)
- [x] Post-estimation diagnostics (`estat`, `vif`, `residuals`)
- [x] Model comparison tables (`estimates table`)
- [x] Stepwise variable selection (forward/backward)
- [x] Safe expression language (no eval)
- [x] Tab completion and multi-level undo in REPL
- [x] LaTeX and Markdown report export
- [x] CSV, Parquet, Stata .dta, Excel import/export
- [x] Configuration file support
- [x] CI/CD with GitHub Actions (1173 tests, 91% coverage)

### Planned

- [ ] Stabilize experimental estimators (panel, survival, survey, IV)
- [ ] Replace remaining pandas paths in large-data workflows
- [ ] Publish full documentation site
- [ ] Improve backend abstraction (shared engine contract for load/query/model/export)
- [ ] SAS and SPSS file format support

---

## Acknowledgements

statika is built on top of excellent open-source libraries:

- [Polars](https://pola.rs/) — for reimagining what a DataFrame library can be
- [statsmodels](https://www.statsmodels.org/) — for bringing professional-grade statistics to Python
- [SciPy](https://scipy.org/) — for decades of scientific computing
- [Rich](https://github.com/Textualize/rich) — for making terminal output readable
- [prompt-toolkit](https://python-prompt-toolkit.readthedocs.io/) — for the interactive REPL foundation

---

## License

MIT License. See [LICENSE](LICENSE) for the full text.

---

<p align="center">
  <a href="https://github.com/baristiran/statika">GitHub</a> &bull;
  <a href="https://pypi.org/project/statika/">PyPI</a> &bull;
  <a href="CONTRIBUTING.md">Contributing</a>
</p>

<p align="center">
  <em>Not affiliated with StataCorp LLC, IBM SPSS, or SAS Institute Inc.</em><br>
  <em>statika is an independent open-source project.</em>
</p>
