Metadata-Version: 2.4
Name: pmf-acls
Version: 0.1.9
Summary: Experimental Positive Matrix Factorization (PMF) routines
Author: Jerritt Collord
License-Expression: GPL-3.0-only
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.20.0
Requires-Dist: scipy>=1.7.0
Provides-Extra: jax
Requires-Dist: jax>=0.4.0; extra == "jax"
Requires-Dist: jaxlib>=0.4.0; extra == "jax"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=3.0.0; extra == "dev"
Dynamic: license-file

# pmf_acls — Positive Matrix Factorization

A Python implementation of Positive Matrix Factorization (PMF) for environmental data analysis, featuring five solver backends with per-element uncertainty weighting.

Minimizes the uncertainty-weighted Q objective:

```
Q = Σᵢⱼ [(xᵢⱼ - Σₖ fᵢₖ gₖⱼ) / σᵢⱼ]²
```

where F is (m, p) — factor profiles (variables × factors) and G is (p, n) — factor contributions (factors × observations).

## Installation

```bash
pip install pmf-acls
```

## Quick Start

```python
import numpy as np
from pmf_acls import pmf

# X = data matrix (m variables × n observations)
# sigma = per-element uncertainties (same shape)
result = pmf(X, sigma, p=3)  # ACLS algorithm (default)
print(f"Converged: {result.converged}, Q: {result.Q:.4f}")

F = result.F  # (m, p) factor profiles
G = result.G  # (p, n) factor contributions
```

## Algorithms

The unified `pmf()` entry point supports four algorithms via `algorithm=`. A fifth (LDA) is available as a standalone function.

### ACLS (default) — `algorithm='acls'`

Alternating Constrained Least Squares (Langville et al. 2014). Solves weighted k × k normal equations per column/row. Fast and robust.

```python
result = pmf(X, sigma, p=3, algorithm='acls')
```

| Parameter | Default | Description |
|-----------|---------|-------------|
| `lambda_W` | 0.0 | Tikhonov regularization on F |
| `lambda_H` | 0.0 | Tikhonov regularization on G |
| `fpeak` | 0.0 | FPEAK rotational parameter |
| `max_iter` | 1000 | Maximum iterations |
| `conv_tol` | 0.005 | Relative change in Q for convergence |

### LS-NMF — `algorithm='ls-nmf'`

Weighted multiplicative updates (Wang et al. 2006). Matches ESAT's LS-NMF for direct comparison.

```python
result = pmf(X, sigma, p=3, algorithm='ls-nmf')
```

| Parameter | Default | Description |
|-----------|---------|-------------|
| `max_iter` | 1000 | Maximum iterations |
| `conv_tol` | 0.005 | Relative change in Q for convergence |

### Newton — `algorithm='newton'`

Newton-based method (Lu & Wu 2005). Solves large (mp+np) × (mp+np) systems. Supports multi-phase iteration control and outlier reweighting.

```python
result = pmf(X, sigma, p=3, algorithm='newton')
```

| Parameter | Default | Description |
|-----------|---------|-------------|
| `mode` | 'full' | 'basic', 'regularized', or 'full' |
| `alpha, beta, gamma, delta` | 'auto' | Strength coefficients for penalties |
| `solver` | 'numpy' | Backend: 'numpy', 'jax', or 'jax_sparse' |
| `enable_reweighting` | True | Iterative outlier reweighting |
| `outlier_threshold` | 4.0 | Outlier detection threshold (× sigma) |
| `enable_step_control` | True | Step length control |
| `max_step_halving` | 3 | Max step halving attempts |
| `enable_initial_accel` | True | Initial NNLS acceleration |
| `accel_iterations` | 10 | Iterations with acceleration |
| `enable_multiphase` | True | PMF2-style multi-phase iteration |
| `phase_config` | None | Custom phase config (list of dicts) |

### Bayesian NMF — `algorithm='bayes'`

Gibbs sampling with heteroscedastic Gaussian noise and exponential priors (Schmidt et al. 2009, Brouwer et al. 2017). Combines a robust multi-seed ACLS point estimate with Bayesian uncertainty quantification via Gibbs sampling.

Via `pmf()`, returns `PMFResult` with the ACLS point estimate. For full Bayesian output (posterior samples, Geweke diagnostics, credible intervals), call `pmf_bayes()` directly.

```python
# Via unified interface (returns PMFResult)
result = pmf(X, sigma, p=3, algorithm='bayes')

# Direct call (returns BayesNMFResult with full posterior)
from pmf_acls import pmf_bayes
result = pmf_bayes(X, sigma, p=3, n_samples=1000, n_burnin=500)

result.F          # ACLS point estimate (sharp, well-separated profiles)
result.G          # ACLS point estimate (contributions)
result.F_std      # posterior standard deviations (Bayesian UQ)
result.G_std      # posterior standard deviations (Bayesian UQ)
result.Q_samples  # Q trace for convergence diagnostics

# Posterior means also available (may show higher inter-factor similarity)
result.F_posterior_mean
result.G_posterior_mean
```

| Parameter | Default | Description |
|-----------|---------|-------------|
| `n_samples` | 1000 | Posterior samples to collect |
| `n_burnin` | 500 | Burn-in sweeps to discard |
| `n_thin` | 1 | Thinning factor |
| `lambda_F` | 1.0 | Exponential rate prior on F (profiles) |
| `lambda_G` | 1.0 | Exponential rate prior on G (contributions) |
| `learn_hyperparams` | True | Conjugate Gamma updates for lambda |
| `hyperparam_shape` | 1.0 | Shape (a) of Gamma(a, b) hyperprior |
| `hyperparam_rate` | 0.1 | Rate (b) of Gamma(a, b) hyperprior |
| `warm_start` | True | Multi-seed ACLS initialization (see below) |
| `warm_start_seeds` | 30 | Number of independent ACLS runs |
| `warm_start_iter` | 1000 | Max iterations per ACLS run |
| `store_samples` | False | Keep full F/G posterior sample arrays |
| `sampler` | 'icdf' | Truncated-normal method: 'icdf' (fast) or 'scipy' |
| `prior_F_shape` | 1.0 | Gamma shape for F prior (1.0 = exponential) |
| `prior_G_shape` | 1.0 | Gamma shape for G prior (1.0 = exponential) |
| `F_prior_mean` | None | Per-element prior mean for F, shape (m, p) |
| `F_prior_scale` | 1.0 | Scale for F prior mean penalty |
| `volume_alpha` | 0.0 | Log-det volume prior on F (0 = disabled) |
| `ard` | False | Automatic Relevance Determination for factor pruning |
| `ard_threshold` | 0.01 | Relative contribution threshold for active factors |
| `robust` | False | Student's t likelihood for outlier robustness |
| `robust_df` | 5.0 | Degrees of freedom for Student's t (lower = heavier tails) |
| `sigma_prior` | None | Hierarchical sigma: `'per_variable'` or `'per_element'` |
| `sigma_categories` | None | Category labels for grouped sigma learning |
| `sigma_prior_params` | None | Dict with InvGamma prior params (shape, scale) |
| `likelihood` | `'gaussian'` | `'gaussian'` or `'lognormal'` |
| `mh_step_size_F` | 0.1 | MH proposal scale for F (lognormal likelihood) |
| `mh_step_size_G` | 0.1 | MH proposal scale for G (lognormal likelihood) |

Convergence is assessed via the Geweke z-score on the Q trace (|z| < 2 → converged).

#### Point estimate and warm-start

When `warm_start=True` (the default), `pmf_bayes` runs multiple independent ACLS solves (`warm_start_seeds` runs, each up to `warm_start_iter` iterations) and selects the best (lowest Q) as the initialization. This ACLS solution serves two purposes:

1. **Point estimate**: Returned as `result.F` and `result.G` — these are sharp, well-separated factor profiles suitable for direct interpretation.
2. **Alignment reference**: Used as the target for Hungarian alignment during sampling, which reduces label switching in the posterior.

The Bayesian Gibbs sampler then provides uncertainty quantification around this point estimate via `result.F_std`, `result.G_std`, and the full posterior samples (if `store_samples=True`). The posterior mean is available as `result.F_posterior_mean` / `result.G_posterior_mean` but may show higher inter-factor similarity due to residual label switching — the ACLS point estimate is preferred for profile interpretation.

#### Sampler design

The Gibbs sampler uses several techniques to maintain factor separation and prevent common failure modes of element-wise NMF sampling:

- **Interleaved updates**: Each sweep updates F[:,k] then immediately G[k,:] per factor, rather than updating all of F then all of G. This couples each factor's profile and contribution, reducing the tendency for weak factors to drift toward dominant ones.
- **Per-sweep normalization**: After each Gibbs sweep, F columns are normalized to unit L1 and the scale is absorbed into G rows. This prevents scale degeneracy where one factor's profile shrinks while its contribution grows unboundedly (or vice versa), a known failure mode of element-wise Gibbs sampling for NMF.
- **Hungarian alignment**: During the sampling phase, factors are permuted each sweep to best match the ACLS warm-start reference (via cosine similarity and the Hungarian algorithm). This reduces label switching that would otherwise blur the posterior statistics.
- **Scalar precision (tau_scale)**: A global noise scaling parameter sampled from a Gamma posterior each sweep (Schmidt et al. 2009, Brouwer et al. 2017), allowing the model to adapt the overall noise level.

#### Automatic Relevance Determination (ARD)

ARD enables automatic factor number selection. Over-specify p (e.g., request 8 factors when you expect 3-5) and the model prunes unnecessary factors by driving their prior rates toward infinity:

```python
result = pmf_bayes(X, sigma, p=8, ard=True, hyperparam_shape=0.5)
print(f"Active factors: {result.effective_p} / 8")
print(f"Active mask: {result.active_factors}")
# Per-factor rates: large values = pruned factors
print(f"lambda_F per factor: {result.ard_lambda_F}")
```

- `hyperparam_shape < 1` (e.g., 0.5) promotes aggressive pruning
- `hyperparam_shape = 1.0` (default) provides moderate pruning
- `ard_threshold` controls the relative contribution cutoff for "active"

When ARD is enabled, the sampler uses random initialization (not ACLS warm-start) so that extra factors can be pruned from scratch. Each factor k shares a single rate parameter lambda_k for both F[:,k] and G[k,:] (Brouwer et al. 2017), with activity determined by G row mass (since per-sweep normalization makes all F columns unit L1).

#### Volume prior

The `volume_alpha` parameter adds a log-det volume prior on F that encourages factor separation (distinct profiles). This penalizes collinear factors:

```python
result = pmf_bayes(X, sigma, p=4, volume_alpha=1.0)
```

#### Sparse Gamma priors

Use `prior_F_shape` and `prior_G_shape` < 1.0 to impose sparsity-promoting Gamma priors instead of exponential (which is Gamma with shape=1):

```python
result = pmf_bayes(X, sigma, p=4, prior_F_shape=0.5, prior_G_shape=0.5)
```

#### Robust mode (Student's t likelihood)

Enable `robust=True` to down-weight outlier observations via a Gaussian-scale mixture representation of Student's t (Yuan et al.). Each observation gets a latent weight eta_ij; small eta values flag outliers:

```python
result = pmf_bayes(X, sigma, p=3, robust=True, robust_df=5.0)
print(f"Outlier fraction: {result.outlier_mask.mean():.1%}")
```

#### Hierarchical sigma

Learn observation uncertainties from the data rather than treating sigma as fixed:

```python
result = pmf_bayes(X, sigma, p=3, sigma_prior='per_variable')
# Posterior learned uncertainties
result.sigma_posterior_mean  # shape (m,) for per_variable, (m, n) for per_element
```

#### Log-normal likelihood

For concentration data that is approximately log-normally distributed, use Metropolis-Hastings updates in log-space:

```python
result = pmf_bayes(X, sigma, p=3, likelihood='lognormal',
                   mh_step_size_F=0.1, mh_step_size_G=0.1)
print(f"MH acceptance rate: {result.mh_acceptance_rate:.1%}")  # target: 20-50%
```

### Bayesian LDA — `pmf_lda()` (standalone)

Dirichlet-Gaussian source apportionment via Metropolis-within-Gibbs. F columns are simplex-constrained (proper compositional profiles summing to 1); G rows have exponential priors. Not available through `pmf()` — use `pmf_lda()` directly.

```python
from pmf_acls import pmf_lda
result = pmf_lda(X, sigma, p=3, n_samples=1000, n_burnin=500)
result.mh_acceptance_rate  # target: 0.1–0.5
result.kl_samples  # KL divergence trace
```

| Parameter | Default | Description |
|-----------|---------|-------------|
| `n_samples` | 1000 | Posterior samples to collect |
| `n_burnin` | 500 | Burn-in sweeps to discard |
| `n_thin` | 1 | Thinning factor |
| `alpha` | 1.0 | Symmetric Dirichlet concentration (fixed) |
| `lambda_G` | 1.0 | Exponential rate prior on G (contributions) |
| `learn_hyperparams` | True | Conjugate Gamma updates for lambda_G |
| `hyperparam_shape` | 1.0 | Shape (a) of Gamma(a, b) hyperprior |
| `hyperparam_rate` | 0.1 | Rate (b) of Gamma(a, b) hyperprior |
| `mh_step_size` | 1.0 | MH proposal scale (tune for acceptance 0.1–0.5) |
| `store_samples` | False | Keep full posterior sample arrays |
| `sampler` | 'icdf' | Truncated-normal method for G updates |

Key difference from Bayesian NMF: F columns have a joint Dirichlet prior enforcing the simplex constraint by construction (not post-hoc normalization). `alpha < 1` promotes sparse profiles; `alpha > 1` smooths toward uniform. Convergence is assessed via Geweke z-score on the KL divergence trace.

### Common Parameters (all algorithms)

| Parameter | Default | Description |
|-----------|---------|-------------|
| `X` | required | Data matrix, shape (m, n) |
| `sigma` | required | Per-element uncertainties, shape (m, n) |
| `p` | required | Number of factors |
| `F_init, G_init` | None | Initial factor matrices |
| `init_method` | 'random' | 'random', 'random_acol', or 'svd_centroid' |
| `random_seed` | None | Seed for reproducibility |
| `verbose` | False | Print progress |

## FPEAK Rotational Analysis

Sweep FPEAK values to explore rotational ambiguity. A flat Q-vs-FPEAK curve indicates large rotational ambiguity; a steep curve indicates a well-determined rotation.

```python
from pmf_acls import fpeak_sweep

sweep = fpeak_sweep(X, sigma, p=3, fpeak_values=[-1.0, -0.5, 0.0, 0.5, 1.0])
print(f"FPEAK values: {sweep.fpeak_values}")
print(f"Q values: {sweep.Q_values}")
print(f"dQ values: {sweep.dQ_values}")
# sweep.results is a list of PMFResult for each FPEAK value
```

## Bayesian Diagnostics

The `bayes_diagnostics` module provides MCMC convergence diagnostics and model comparison tools for the Bayesian solvers.

### Effective Sample Size (ESS)

Estimates the number of effectively independent samples in a correlated MCMC trace. Low ESS suggests increasing `n_samples` or `n_thin`.

```python
from pmf_acls import effective_sample_size

result = pmf_bayes(X, sigma, p=3, n_samples=2000, n_burnin=500)
ess = effective_sample_size(result.Q_samples)
print(f"ESS: {ess:.0f} / {len(result.Q_samples)} samples")
```

### Gelman-Rubin Rhat (multi-chain convergence)

Compares between-chain and within-chain variance across independent runs. Rhat < 1.05 indicates convergence.

```python
from pmf_acls import gelman_rubin

results = [pmf_bayes(X, sigma, p=3, random_seed=s) for s in range(4)]
rhat = gelman_rubin(*[r.Q_samples for r in results])
print(f"Rhat: {rhat:.3f}")  # target: < 1.05
```

### WAIC (model comparison)

Widely Applicable Information Criterion for comparing models with different p. Lower WAIC is better. Requires `store_samples=True`.

```python
from pmf_acls import compute_waic

waic_scores = {}
for p in [2, 3, 4, 5]:
    result = pmf_bayes(X, sigma, p, store_samples=True, n_samples=500)
    w = compute_waic(X, sigma, result.F_samples, result.G_samples)
    waic_scores[p] = w["waic"]
    print(f"p={p}: WAIC={w['waic']:.1f}  p_waic={w['p_waic']:.1f}  SE={w['se']:.1f}")

best_p = min(waic_scores, key=waic_scores.get)
print(f"Best number of factors: {best_p}")
```

### Factor Count Posterior (ARD)

When using ARD with `store_samples=True`, compute the posterior distribution over the number of active factors:

```python
from pmf_acls import factor_count_posterior

result = pmf_bayes(X, sigma, p=8, ard=True, store_samples=True)
fcp = factor_count_posterior(result.factor_activity_samples)
print(f"MAP factor count: {fcp['map']}")
print(f"Posterior: {fcp['histogram']}")
```

## Uncertainty Estimation

### Bootstrap

Block bootstrap resampling with Procrustes alignment for uncertainty estimation:

```python
from pmf_acls import bootstrap_uncertainty

uncertainty = bootstrap_uncertainty(X, sigma, p=3, n_bootstrap=100)
print(f"F_std shape: {uncertainty.F_std.shape}")  # (m, p)
print(f"G_std shape: {uncertainty.G_std.shape}")  # (p, n)
```

### DISP (G-element displacement)

DISP error estimation matching ESAT's algorithm. For each element G[i,j], binary-searches for a multiplicative modifier such that Q increases by dQ_max:

```python
from pmf_acls import displacement_test

disp = displacement_test(X, sigma, p=3, dQ_thresholds=[4.0, 8.0])
print(f"Swap counts: {disp.swap_counts}")
```

### F-DISP (F-element displacement, Paatero 2014)

Solver-agnostic displacement analysis on the F matrix. For each F[i,k], finds the interval within which it can vary while keeping Q within dQ of the base solution. Supports configurable re-optimization at each probe point:

```python
from pmf_acls import displacement_test_F

# Default: re-optimize only G (fast)
disp = displacement_test_F(X, sigma, F_hat, dQ_thresholds=[4.0, 8.0, 16.0])

# Full re-optimization matching Paatero 2014 methodology
disp = displacement_test_F(
    X, sigma, F_hat,
    dQ_thresholds=[4.0],
    reopt_method='acls',       # 'g_only' (default), 'acls', or 'nnls'
    max_reopt_iter=50,         # max alternating iterations per probe
    reopt_conv_tol=1e-4,       # inner convergence tolerance
)

print(f"Q_base: {disp.Q_base:.2f}")
print(f"F_lo at dQ=4: {disp.F_lo[4.0]}")    # (m, p) lower bounds
print(f"F_hi at dQ=4: {disp.F_hi[4.0]}")    # (m, p) upper bounds
print(f"Swaps at dQ=4: {disp.swap_counts[4.0].sum()}")  # 0 = stable
print(f"Approx 1-sigma: {disp.F_std_approx}")  # derived from dQ=4 interval
print(f"Method used: {disp.reopt_method}")
```

| Parameter | Default | Description |
|-----------|---------|-------------|
| `dQ_thresholds` | [4.0, 8.0, 16.0] | Q-increase thresholds (dQ=4 ≈ 95% CI) |
| `max_bisect_iter` | 40 | Max bisection iterations per element |
| `dQ_tol` | 0.1 | Convergence tolerance on Q |
| `reopt_method` | `'g_only'` | `'g_only'`: re-optimize G only (fast). `'acls'`: alternating constrained LS over all F and G (Paatero 2014 Eq. 5-6). `'nnls'`: alternating weighted NNLS for both F and G. |
| `max_reopt_iter` | 50 | Max alternating iterations per probe (ignored for `'g_only'`) |
| `reopt_conv_tol` | 1e-4 | Inner convergence tolerance (ignored for `'g_only'`) |

### Multistart Test

Test solution stability via multiple random starts:

```python
from pmf_acls import multistart_test

ms = multistart_test(X, sigma, p=3, n_displacements=20)
print(f"Q values: {ms['Q_values']}")
print(f"Converged: {ms['converged']}")
```

## Factor Selection

```python
from pmf_acls import select_factors

selection = select_factors(X, sigma, p_range=(2, 6), n_runs=10)
print(f"Recommended: {selection.best_p} factors")
```

## Compositional Data Analysis

For compositional data (mass fractions, percentages, normalized data), standard PCA operates in the wrong geometry, which can lead to incorrect rank estimates. The `coda` module provides tools for Aitchison geometry.

### ILR Scree Analysis

The primary use case: estimate the number of sources using singular values in ILR space, free of closure-induced spurious correlations. Compare with standard PCA scree to detect closure artifacts distorting your rank estimate.

```python
from pmf_acls import ilr_scree

singular_values, explained_ratio = ilr_scree(X)
# Elbow in ILR scree gives compositionally-correct rank estimate
```

### ILR-PCA Initialization

`ilr_pca_init` provides a deterministic starting point for PMF by back-transforming ILR-PCA singular vectors to the simplex. Note that PCA directions are orthogonal variance maximizers, not NMF sources — multi-start random initialization typically recovers factors as well or better.

```python
from pmf_acls import pmf, ilr_pca_init

F_init, G_init = ilr_pca_init(X, p=4)
result = pmf(X, sigma, p=4, F_init=F_init, G_init=G_init)
```

### Aitchison NMF

NMF with a σ-weighted Aitchison cost function (Formulation B from Egozcue & Pawlowsky-Glahn 2016). Minimizes weighted centered log-residuals, penalizing relative (ratio) errors rather than absolute errors. Weights w_ij = X_ij²/σ_ij² naturally down-weight near-detection-limit species.

```python
from pmf_acls import aitchison_nmf

result = aitchison_nmf(X, sigma, p=3, random_seed=42)
# result.Q is the Aitchison cost, not the standard PMF chi-squared
```

Requires strictly positive X — use `multiplicative_replacement()` for data with zeros.

### Low-Level CoDA Utilities

Available via `from pmf_acls.coda import ...`:

| Function | Description |
|----------|-------------|
| `closure(X)` | Normalize rows to sum to 1 |
| `clr_transform(X)` / `clr_inverse(X_clr)` | Centered log-ratio transform |
| `ilr_transform(X)` / `ilr_inverse(X_ilr)` | Isometric log-ratio transform |
| `aitchison_distance(X, Y)` | Aitchison distance between compositions |
| `multiplicative_replacement(X)` | Replace zeros for log-ratio transforms |

## Data Preparation

```python
from pmf_acls import prepare_data

X_clean, sigma = prepare_data(
    X_raw,
    detection_limit=detection_limits,
    missing_method='median',
    bdl_replacement='half_dl',
    uncertainty_method='poisson',
)
result = pmf(X_clean, sigma, p=3)
```

## Tests

```bash
pytest pmf_acls/tests/ -v
```

## Dependencies

- numpy >= 1.20.0
- scipy >= 1.7.0

### Optional

- JAX >= 0.4.0 (GPU acceleration for Newton solver)

## License

GPL-3.0-only

## References

- Lu, J. & Wu, L. (2005). Technical details and programming guide for a general two-way PMF algorithm.
- Langville, A. N. et al. (2014). Algorithms, Initializations, and Convergence for the NMF.
- Wang, G. et al. (2006). LS-NMF: A modified non-negative matrix factorization algorithm utilizing uncertainty estimates.
- Schmidt, M. N. et al. (2009). Bayesian non-negative matrix factorization.
- Brouwer, T. et al. (2017). Comparative study of inference methods for Bayesian nonnegative matrix factorisation.
- Cemgil, A. T. (2009). Bayesian inference for nonnegative matrix factorisation models.
- Yuan, Q. et al. (2006). Estimation of organic carbon to primary organic carbon ratio using ACSM/AMS data.
- Blei, D. M. et al. (2003). Latent Dirichlet Allocation.
- Paatero, P. (2013). Interactive comment on "Characterizing the impact of urban emissions on regional aerosol particles."
- Aitchison, J. (1982). The statistical analysis of compositional data. J. R. Stat. Soc. B.
- Egozcue, J. J. et al. (2003). Isometric logratio transformations for compositional data analysis. Math. Geol.
- Egozcue, J. J. & Pawlowsky-Glahn, V. (2016). Changing the reference measure in the simplex and its weighting effects. Austrian J. Stat.
- Martín-Fernández, J. A. et al. (2003). Dealing with zeros and missing values in compositional data sets. Math. Geol.
