HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise)
Clustering is a technique in data science used to find patterns or groupings in large data sets. It is a very good, general technique with broad applicability across scientific domains.
Leland McInnes and John Healy, two CSE researchers at the Tutte Institute for Mathematics and Computing, refined and improved the DBSCAN (density-based spatial clustering of applications with noise) algorithm. The redesigned algorithm is orders of magnitude more efficient and Leland and John have written high-performance code implementing the algorithm. This implementation is now the de-facto reference implementation of the algorithm.
The refined HDBSCAN algorithm, implemented in Python, is available for download on GitHub - a repository hosting service for code - as part of the scikit-learn-contrib project. It is also available from PyPI and conda-forge, two popular software package sites for Python.
What is HDBSCAN used for?
HDBSCAN is being used in a variety of different ways. Provided below are just some of the fields where it has been applied.
- Impact of Lyman alpha pressure on metal-poor dwarf galaxies.
- Testing feedback-modified dark matter haloes with galaxy rotation curves: estimation of halo parameters and consistency with ΛCDM scaling relations.
- Improving Malware Detection Accuracy by Extracting Icon Information.
- Hierarchical Density-based Clustering of Malware Behaviour.
Accounting anomaly detection:
- Detection of Anomalies in Large Scale Accounting Data using Deep Autoencoder Networks
- PhylOligo: a package to identify contaminant or untargeted organism sequences in genome assemblies.
- Single-cell transcriptional regulations and accessible chromatin landscape of cell fate decisions in early heart development.
- Visualizing correlated motion with HDBSCAN clustering.
- Uncovering Large-Scale Conformational Change in Molecular Dynamics without Prior Knowledge.
Product defect detection:
- Diagnostics of Product Defects by Clustering and Machine Learning Classification Algorithm.
Bitcoin / blockchain analysis:
- De-Anonymizing the Bitcoin Blockchain.
Is HDBSCAN production ready?
HDBSCAN is currently in a resting state. It is stable, and various people are further refining and adapting the open source code.
- Date modified: