Nonparametric distributional regression using LightGBM. Predict full probability distributions p(y|x) - not just point estimates.
Standard regression gives a single number. But real-world outcomes are uncertain, often skewed, and sometimes multimodal. Full distributional predictions unlock richer decision-making.
Know how confident a prediction is. Get calibrated prediction intervals, not just point estimates.
Capture multimodal, skewed, heavy-tailed, or zero-inflated distributions - no parametric assumptions required.
Optimize expected utility, compute Value-at-Risk, or feed full densities into downstream systems.
| Point Prediction | Probabilistic Prediction | |
|---|---|---|
| Retail demand | "Sell 142 units" | "80% chance between 120-170, slight chance of spike above 200" |
| House price | "$450,000" | "Most likely $430k-$480k, small secondary mode near $520k" |
| Patient outcome | "12 months" | "Bimodal: quick recovery by 6 mo or extended course around 18 mo" |
DistributionRegressor uses a soft-target approach that turns distributional regression into a single gradient boosted model trained with cross-entropy loss.
n_bins pointsAt inference, for a new input x, we query the model at every grid point and normalize the outputs to obtain a proper probability distribution.
Requires Python ≥ 3.8
# Clone and install in development mode
git clone https://github.com/guyko81/DistributionRegressor.git
cd DistributionRegressor
pip install -e .
All dependencies are installed automatically via pip.
from distribution_regressor import DistributionRegressor print("Ready!")
Install and start predicting distributions in under 10 lines of code.
from distribution_regressor import DistributionRegressor # Initialize model = DistributionRegressor( n_bins=50, # grid resolution n_estimators=500, # LightGBM trees sigma='auto', # automatic kernel width ) # Train model.fit(X_train, y_train) # Point predictions y_mean = model.predict(X_test) # expected value y_mode = model.predict_mode(X_test) # most likely value # Uncertainty intervals = model.predict_interval(X_test, confidence=0.9) std = model.predict_std(X_test) # Full distribution grids, dists, offsets = model.predict_distribution(X_test) # Evaluate density at specific points density = model.pdf(X_test, y_test) nll = model.nll(X_test, y_test)
import matplotlib.pyplot as plt # Predict distribution for a single sample grids, dists, offsets = model.predict_distribution(X_test[0:1]) plt.plot(grids[0], dists[0], label='Predicted PDF') plt.axvline(y_test[0], color='r', linestyle='--', label='True Value') plt.legend() plt.show()
| Parameter | Default | Description |
|---|---|---|
n_bins | 50 | Grid resolution. Higher = finer distributions but more compute. 50-200 is typical. |
sigma | 'auto' |
Kernel bandwidth for soft targets. 'auto' estimates per-sample noise from CV residuals. Float for fixed width. |
use_base_model | False |
If True, fits an LGBM baseline and models residuals. Can improve RMSE at the cost of distribution shape fidelity. |
monte_carlo_training | True |
Sample grid points instead of full expansion. Reduces memory ~20x with minimal quality loss. |
mc_samples | 5 | Number of grid points per sample in MC mode. 5-10 is usually enough. |
n_estimators | 100 | Number of LightGBM boosting iterations. |
learning_rate | 0.1 | Boosting learning rate. |
**kwargs | - | Any LightGBM parameter: max_depth, num_leaves, subsample, colsample_bytree, etc. |
Scikit-learn compatible interface with rich distributional methods.
DistributionRegressor(
n_bins=50, # grid resolution (higher = finer distributions)
sigma='auto', # kernel bandwidth: 'auto' or float
use_base_model=False, # anchor to LGBM point predictions
monte_carlo_training=True,# memory-efficient MC sampling
mc_samples=5, # MC points per sample
n_estimators=100, # LightGBM trees
learning_rate=0.1, # boosting learning rate
**kwargs # passed to LGBMRegressor
)
use_base_model=True, returns the base LGBM prediction directly.@software{distributionregressor2025,
title = {DistributionRegressor: Nonparametric Distributional Regression},
author = {Gabor Gulyas},
year = {2025},
url = {https://github.com/guyko81/DistributionRegressor}
}
Predicting daily sales for a retail store. The model captures day-of-week patterns, promotional effects, and holiday uncertainty - including the hard zero on Sundays.
While the default is use_base_model=False, this is a case where enabling the base model
is beneficial. Retail sales for a single store are unimodal and well-behaved - the base model
anchors point predictions tightly, and the residual distribution captures the remaining uncertainty
with narrow, well-calibrated intervals.
from distribution_regressor import DistributionRegressor model = DistributionRegressor( n_bins=100, n_estimators=1000, learning_rate=0.01, sigma='auto', random_state=42, use_base_model=True, # beneficial here: unimodal, narrow uncertainty ) model.fit(train_data[features], train_data['Sales']) # Prediction intervals for time series visualization p10 = model.predict_quantile(X, 0.1) p50 = model.predict_quantile(X, 0.5) p90 = model.predict_quantile(X, 0.9)
With use_base_model=True, the per-sample adaptive grid centers on the base prediction,
giving high resolution where it matters. The closed-day prediction collapses to a tight spike at zero:
Predicting the full income distribution conditional on demographic features (26 variables, 113k samples). Compared against NGBoost on test-set negative log-likelihood.
| Method | Test NLL | Weighted NLL | Training Time |
|---|---|---|---|
| DistributionRegressor | 11.30 | 11.32 | 5 s |
| NGBoost (LogNormal) | 11.11 | 11.15 | 363 s |
However, NGBoost's strength is also its weakness. With a LogNormal distribution, every prediction is forced into a unimodal, right-skewed shape. When the true conditional distribution is multimodal or has a complex shape, the parametric assumption breaks down.
NGBoost can use more complex parametric families to mitigate this, but the user must choose the right family in advance. DistributionRegressor sidesteps this entirely - it learns whatever shape the data requires.
The use_base_model option controls whether DistributionRegressor learns
distributions directly on the target, or on residuals from a preliminary LightGBM fit.
use_base_model=False (default)The soft-target model learns p(y|x) directly on the raw target variable.
use_base_model=TrueFirst fits a standard LightGBM for point predictions, then models the residual distribution with soft targets.
# Without baseline (default) - learn full distribution directly model_direct = DistributionRegressor(use_base_model=False) model_direct.fit(X_train, y_train) # With baseline - learn residual distribution around point prediction model_baseline = DistributionRegressor(use_base_model=True) model_baseline.fit(X_train, y_train)
While use_base_model=True often delivers better RMSE and can be better calibrated,
it models the residual distribution. This means the model learns a narrow, roughly
symmetric distribution centered near zero - which can obscure multimodal structure present in the
raw target. In cases where the conditional distribution has multiple modes, the direct model
(use_base_model=False) preserves this structure.
Full grid expansion creates n_samples × n_bins rows. Monte Carlo training
samples a small number of grid points per sample, dramatically reducing memory usage.
monte_carlo_training=FalseEach sample is crossed with all grid points. Training set size = n × n_bins.
100k samples × 100 bins = 10M rows
monte_carlo_training=True (default)Each sample gets mc_samples strategically chosen points (exact peak + Gaussian slope + uniform tails).
100k samples × 5 MC points = 500k rows
mc_samples=5 is sufficient.
# Full grid - higher memory, marginally more stable model = DistributionRegressor(monte_carlo_training=False, n_bins=100) # Monte Carlo (default) - much less memory, nearly identical quality model = DistributionRegressor(monte_carlo_training=True, mc_samples=5, n_bins=100)