💥 bunker-stats
A Rust powered statistical toolkit with a Python API and pandas Styler integration.
🔧 Overview
bunker-stats is a hybrid Rust and Python library providing:
- Fast statistical primitives
- Rolling window analytics
- Distribution tools
- pandas Styler visualizations
Everything runs on Rust for speed and correctness.
🧭 Project Philosophy and Status
v0.1 is an intentional early release.
This library focuses on correctness, clean APIs, and solid statistical foundations.
🔮 Future Focus
- Performance tuning (SIMD, fused loops, BLAS ops)
- Smarter rolling window engines
- More visualization helpers
- NaN safe variants
- Multi column Rust kernels
- Faster correlation matrix engine
🚀 Features
Core statistics (Rust)
- Mean, variance, standard deviation
- Sample vs population versions
- Z scores
- MAD
- Percentiles and quantiles
- IQR and Tukey fences
- Covariance, correlation
- Welford one pass algorithms
- EWMA
Rolling analytics
- Rolling mean, std, z score
- Rolling covariance, correlation
- Planned fused pipelines
Distribution tools
- ECDF
- Gaussian KDE
- Quantile binning
- Winsorization
Transforms
- Robust scaling using Median and MAD
- diff, pct_change, cumsum, cummean
pandas Styler
demean_style(df, column)
zscore_style(df, column, threshold=...)
iqr_outlier_style(df, column)
corr_heatmap(df)
robust_scale_column(df, column)
| Function | Bunker-stats syntax | NumPy equivalent | pandas equivalent | Unique feature in bunker-stats |
|---|---|---|---|---|
mean |
bs.mean(x) |
np.mean(x) |
s.mean() |
1D mean helper; always treats input as 1D numeric, thin Rust-backed wrapper. |
mean_skipna |
bs.mean_skipna(x) |
np.nanmean(x) / manual mask |
s.mean(skipna=True) |
NaN-aware mean with explicit “skipna” semantics, matching pandas mental model. |
var |
bs.var(x) |
np.var(x, ddof=1) |
s.var(ddof=1) |
1D sample variance (ddof=1) by default; matches stats textbooks. |
var_skipna |
bs.var_skipna(x) |
np.nanvar(x, ddof=1) / mask |
s.var(skipna=True, ddof=1) |
NaN-aware sample variance in one call. |
std |
bs.std(x) |
np.std(x, ddof=1) |
s.std(ddof=1) |
1D sample std with fixed ddof=1, consistent with var. |
std_skipna |
bs.std_skipna(x) |
np.nanstd(x, ddof=1) / mask |
s.std(skipna=True, ddof=1) |
NaN-aware sample std; avoids writing masks every time. |
percentile |
bs.percentile(x, q=0.95) |
np.quantile(x, 0.95) / np.percentile |
np.quantile(s, 0.95) |
Clean 1D percentile with your interpolation; integrated with other robust stats. |
mad |
bs.mad(x) |
manual median/MAD | custom or s.mad() (mean abs dev, not median) |
True median absolute deviation used by robust_scale. |
iqr |
q1, q3, iqr = bs.iqr(x) |
scipy.stats.iqr(x, rng=(25,75)) |
s.quantile([0.25, 0.75]) |
Returns (q1, q3, iqr) in one go; no juggling multiple calls / indices. |
mean_axis |
bs.mean_axis(X, axis=0, skipna=False) |
np.mean(X, axis=0) |
df.mean(axis=0, skipna=...) |
Axis-wise mean for 1D/2D arrays with optional skipna. |
var_axis |
bs.var_axis(X, axis=1, skipna=True) |
np.var(X, axis=1, ddof=1) (no native skipna) |
df.var(axis=1, skipna=...) |
Axis-wise sample variance with built-in NaN handling. |
std_axis |
bs.std_axis(X, axis=1, skipna=True) |
np.std(X, axis=1, ddof=1) (no native skipna) |
df.std(axis=1, skipna=...) |
Axis-wise sample std + skipna; aligns pandas mental model with NumPy arrays. |
mean_last_axis* |
bs.mean_last_axis(X) (if exposed) |
np.mean(X, axis=-1) |
df.to_numpy().mean(axis=-1) |
N-D mean over last axis, consistent with your N-D rolling API. |
rolling_mean_last_axis |
bs.rolling_mean_last_axis(X, window=3) |
manual reshape + loop / np.apply_along_axis |
no built-in; need groupby+apply / custom logic | Shape-preserving N-D rolling mean over last axis (e.g. (batch, feat, time)). |
rolling_std_last_axis |
bs.rolling_std_last_axis(X, window=3) |
same as above | same | N-D rolling std over last axis; perfect for batched time-series / ML tensors. |
rolling_mean |
bs.rolling_mean(x, window=5) |
manual loop or np.convolve trick |
s.rolling(5).mean() |
Fast 1D rolling mean (truncated length) with no index overhead. |
rolling_std |
bs.rolling_std(x, window=5) |
manual loop | s.rolling(5).std() |
1D rolling std at Rust speed, sample variance convention. |
rolling_zscore |
bs.rolling_zscore(x, window=20) |
manual window loop | s.rolling(20).apply(custom) |
Rolling z-score in a single function; avoids apply/UDF overhead. |
ewma |
bs.ewma(x, alpha=0.1) |
manual recurrence | s.ewm(alpha=0.1).mean() |
Minimal EWMA for pure numeric arrays, no pandas object overhead. |
df_rolling_mean |
bs.df_rolling_mean(df, window=5) |
np.convolve per column |
df.rolling(5).mean() |
DataFrame in / out, but columns powered by Rust rolling mean. |
df_rolling_std |
bs.df_rolling_std(df, window=5) |
manual per-column | df.rolling(5).std() |
Same for std; uses your rolling core but preserves pandas index. |
df_ewma |
bs.df_ewma(df, alpha=0.1) |
manual per-column EWMA | df.ewm(alpha=0.1).mean() |
Per-column EWMA with Rust engine, lighter than full pandas EWM machinery. |
col_mean |
bs.col_mean(df, skipna=True) |
np.mean(df.to_numpy(), axis=0) |
df.mean(axis=0, skipna=True) |
Column-wise mean; internally uses mean_axis + skipna, returns labeled Series. |
row_mean |
bs.row_mean(df, skipna=True) |
np.mean(df.to_numpy(), axis=1) |
df.mean(axis=1, skipna=True) |
Row-wise mean with Rust numeric core + pandas index. |
cov_df |
bs.cov_df(df) |
np.cov(df.to_numpy().T, ddof=1) |
df.cov() |
Full covariance matrix via Rust cov_matrix, but returned as a DataFrame. |
corr_df |
bs.corr_df(df) |
np.corrcoef(df.to_numpy().T) |
df.corr() |
Correlation matrix backed by your Rust correlation engine. |
rolling_mean_series |
bs.rolling_mean_series(s, window=10) |
manual 1D loop | s.rolling(10).mean() |
Series-in / Series-out convenience wrapper around Rust rolling mean. |
rolling_std_series |
bs.rolling_std_series(s, window=10) |
manual 1D loop | s.rolling(10).std() |
Same for std; keeps index alignment, uses Rust core. |
iqr_outliers |
bs.iqr_outliers(x, k=1.5) |
iqr = scipy.stats.iqr(x); mask = ... |
quantiles + boolean mask | Returns a boolean outlier mask in one call using IQR rule. |
zscore_outliers |
bs.zscore_outliers(x, threshold=3.0) |
(np.abs((x-x.mean())/x.std()) > 3) |
same logic on Series |
One-liner z-score outlier mask; integrates with your mean/std semantics. |
minmax_scale |
scaled, mn, mx = bs.minmax_scale(x) |
manual (x-mn)/(mx-mn) |
use MinMaxScaler from sklearn |
Returns both scaled data and the (min, max) used (for inverse-transform/reuse). |
robust_scale |
scaled, med, mad = bs.robust_scale(x, scale_factor) |
manual MAD calculation | RobustScaler or custom |
All-in-one robust scaling with returned (median, MAD); pairs with your mad. |
winsorize |
bs.winsorize(x, lower_q=0.05, upper_q=0.95) |
scipy.stats.mstats.winsorize(x, limits=...) |
custom quantile clipping | 1D winsorization in Rust, single call returning a full adjusted array. |
diff |
bs.diff(x, periods=1) |
np.diff(x, n=1) (shorter) / manual padding |
s.diff(periods=1) |
Full-length diff with NaNs where necessary; supports negative periods. |
pct_change |
bs.pct_change(x, periods=1) |
manual (x[i]-x[i-p]) / x[i-p] |
s.pct_change(periods=1) |
Includes divide-by-zero → NaN handling; symmetric for positive/negative lags. |
cumsum |
bs.cumsum(x) |
np.cumsum(x) |
s.cumsum() |
Rust implementation; value is performance on large 1D arrays. |
cummean |
bs.cummean(x) |
np.cumsum(x)/np.arange(1,len(x)+1) |
s.expanding().mean() |
Streaming cumulative mean without constructing expanding windows. |
ecdf |
vals, probs = bs.ecdf(x) |
manual sort + rank | custom rank/value_counts |
Returns sorted values + CDF in one go; perfect for ECDF plots. |
quantile_bins |
bins = bs.quantile_bins(x, n_bins=10) |
manual rank + binning | pd.qcut(x, q=10) (Categorical) |
Returns plain integer bin labels 0..n_bins-1 as a NumPy array (ML-friendly). |
sign_mask |
mask = bs.sign_mask(x) |
np.sign(x).astype(np.int8) |
(s > 0) - (s < 0) |
Encodes sign into {-1, 0, 1}; useful for discrete signal features. |
demean_with_signs |
demeaned, signs = bs.demean_with_signs(x) |
(x - x.mean(), np.sign(x - x.mean())) |
custom | Returns both demeaned data and sign mask in one pass. |
cov |
bs.cov(x, y) |
np.cov(x, y, ddof=1)[0,1] |
s1.cov(s2) |
1D sample covariance as a simple scalar function. |
corr |
bs.corr(x, y) |
np.corrcoef(x, y)[0,1] |
s1.corr(s2) |
1D Pearson correlation using your var/std core. |
cov_skipna |
bs.cov_skipna(x, y) |
manual pairwise dropna + np.cov |
s1.cov(s2) with aligned/dropna |
Pairwise NaN dropping built in for 1D covariance. |
corr_skipna |
bs.corr_skipna(x, y) |
manual pairwise dropna + np.corrcoef |
s1.corr(s2) with dropna |
Same but for correlation; hides the messy mask-bookkeeping. |
cov_matrix |
bs.cov_matrix(X) |
np.cov(X, rowvar=False, ddof=1) |
df.cov() |
Symmetric covariance matrix with Rust loops; tuned for tabular X. |
corr_matrix |
bs.corr_matrix(X) |
np.corrcoef(X, rowvar=False) |
df.corr() |
Correlation matrix built on your cov/std stack; consistent behaviour across code paths. |
rolling_cov |
bs.rolling_cov(x, y, window=50) |
manual sliding window + np.cov |
df['x'].rolling(50).cov(df['y']) |
Rolling 1D covariance without pandas overhead; good for streaming stats. |
rolling_corr |
bs.rolling_corr(x, y, window=50) |
manual sliding window + np.corrcoef |
df['x'].rolling(50).corr(df['y']) |
Rolling 1D correlation in one Rust call; no custom loop needed in Python. |
kde_gaussian |
grid, dens = bs.kde_gaussian(x, n_points=256) |
scipy.stats.gaussian_kde(x) + evaluation |
no direct builtin (need SciPy) | Lightweight 1D Gaussian KDE; returns (grid, density) using a simple bandwidth rule by default. |
📦 Installation
```bash git clone https://github.com/bunker-stats.git cd bunker-stats
python -m venv .venv source .venv/bin/activate # Windows: .venv
pip install maturin maturin develop