Skip to content

hNMF

hNMF implements Rank-2 NMF for Hierarchical Clustering as described in this paper and repository.

hNMF is a fork of hierarchical-nmf-python with several modifications:

  • Interface to hNMF is provided with a scikit-learn compatible BaseEstimator
  • Improved performance timings
  • Convenience methods for interpreting results

Why hNMF?

Unlike flat NMF where you specify cluster count upfront, hNMF discovers it through successive splitting using a coherence threshold. In practice, this means you don't need to guess the number of topics or account for nuances in topic granularity. Instead, you can rapidly iterate to find a coherence level that yields meaningful topics for your dataset.

Performance

The paper mentions that the hierarchical NMF process takes advantage of a fast 2-rank matrix decomposition, While this may be true in MATLAB, the original Python implementation was significantly bottlenecked when running the 2-rank decomposition.

Citations

[1] Da Kuang, Haesun Park, Fast rank-2 nonnegative matrix factorization for hierarchical document clustering, The 19th ACM SIGKDD International Conference on Knowledge, Discovery, and Data Mining (KDD '13), pp. 739-747, 2013.