Skip to content

Include HDBSCAN as a sub-module for sklearn.cluster #22616

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 169 commits into from
Oct 12, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
169 commits
Select commit Hold shift + click to select a range
1c61429
Initial addition of hdbscan
Micky774 Feb 25, 2022
c5240b7
Added wraparound wrappers where needed
Micky774 Feb 26, 2022
74bd0b3
Updated documentation
Micky774 Feb 27, 2022
15793b2
Merge branch 'main' into hdbscan
Micky774 Mar 4, 2022
faa06b5
Added a new batch of doc updates for passing docstring tests
Micky774 Mar 5, 2022
266c958
Parameter and attribute revisions
Micky774 Mar 6, 2022
2a7cc22
Improved `metric_params` handling
Micky774 Mar 6, 2022
97f036f
Propogated `metric_params` change to tests and other functions
Micky774 Mar 6, 2022
8aa297a
Removed plotting, `to_pandas`, `to_networkx` infrastructure
Micky774 Mar 6, 2022
fe362b5
Removed plotting, `to_pandas`, `to_networkx` infrastructure
Micky774 Mar 6, 2022
dd44dbc
Merge branch 'hdbscan' of https://github.com/Micky774/scikit-learn in…
Micky774 Mar 6, 2022
fda9350
Renamed `plots.py`-->`_trees.py`
Micky774 Mar 6, 2022
7478586
Fixed package namespace in `cluster/__init__.py`
Micky774 Mar 6, 2022
cd1edc4
Drop-in replaced private `dist_metrics` with `metrics.dist_metrics`
Micky774 Mar 6, 2022
0802504
Revert "Drop-in replaced private `dist_metrics` with `metrics.dist_me…
Micky774 Mar 6, 2022
543c35c
Improved hdbscan metric handling and testing
Micky774 Mar 7, 2022
ce94591
Docstring compliance for `flat.py`
Micky774 Mar 7, 2022
e93bfe1
Renamed `flat.py` --> `_flat.py`
Micky774 Mar 7, 2022
028e98f
Renamed `flat.py`-->`_flat.py`
Micky774 Mar 7, 2022
a1ac99a
Renamed `validity.py`-->`_validity.py`
Micky774 Mar 7, 2022
788d4bc
Renamed `robust_single_linkage_.py`
Micky774 Mar 7, 2022
5fba5e0
Merge branch 'main' into hdbscan
Micky774 Mar 7, 2022
cf4f239
Removed `_flat_.py` and associated tests
Micky774 Mar 9, 2022
1ceac43
Made memview readonly constant
Micky774 Mar 10, 2022
6f20a08
Removed experimental/extra API -- may reenable in future PRs
Micky774 Mar 10, 2022
6705fa7
Merge branch 'main' into hdbscan
Micky774 Mar 13, 2022
9e9be81
WIP docstring improvements for RSL
Micky774 Mar 13, 2022
0cd08f3
Trimmed and removed unnecessary RSL estimator
Micky774 Mar 13, 2022
7b73dd8
Updated sqrt2 default in robust_single_linkage
Micky774 Mar 13, 2022
62cf09e
Updated `alpha` arg for rsl functions
Micky774 Mar 13, 2022
f48e148
Added WIP section for HDBSCAN in User Guide
Micky774 Mar 14, 2022
87071a4
Replaced custom `dist_metrics` w/ `metric._dist_metrics`
Micky774 Mar 14, 2022
30b652a
Removed unnecessary arg
Micky774 Mar 14, 2022
42b26e1
Removed vestigial `robust_single_linkage` functionality
Micky774 Mar 16, 2022
e46a418
Removed cython flags
Micky774 Mar 16, 2022
7887e11
Merge branch 'main' into hdbscan
Micky774 Mar 18, 2022
c7699c1
Merge branch 'hdbscan' of https://github.com/Micky774/scikit-learn in…
Micky774 Mar 18, 2022
5ae1d03
Initial addition of HDBSCAN User Guide [doc quick]
Micky774 Mar 19, 2022
d2fbd47
Merge branch 'main' into hdbscan
Micky774 Mar 19, 2022
30f38ea
Add reference for HDBSCAN User Guide entry
Micky774 Mar 19, 2022
38f7019
Added authorship/license info
Micky774 Mar 19, 2022
236c219
Fixed lists in `hdbscan` and improved user guide documentation
Micky774 Mar 21, 2022
59138a7
Merge branch 'main' into hdbscan
Micky774 Mar 21, 2022
7ba96ea
Added name mapping for hdbscan function autosummary
Micky774 Mar 21, 2022
ad92a9b
Merge branch 'main' into hdbscan
Micky774 Mar 25, 2022
d71883d
Merge branch 'main' into hdbscan
Micky774 Mar 25, 2022
b5dcdca
Added hdbscan to `plot_cluster_comparison`
Micky774 Mar 25, 2022
d7734d4
Fixed sphinx lists
Micky774 Mar 25, 2022
9f83f6e
Added initial hdbscan plot file
Micky774 Mar 25, 2022
f98b6bf
Modified clustering rst for image inclusion
Micky774 Mar 25, 2022
5365c3a
Corrected plotting for HDBSCAN
Micky774 Mar 25, 2022
b25d2ad
Fixed image display in user guide entry and fixed hdbscan doc
Micky774 Mar 26, 2022
103642d
Added entry to algorithm comparison table
Micky774 Mar 26, 2022
ba3302d
Added link to original hdbscan repository
Micky774 Mar 26, 2022
0f46e6c
Updated tests and improved caching code
Micky774 Mar 26, 2022
d87855b
Merge branch 'main' into hdbscan
Micky774 Mar 26, 2022
8556478
Merge branch 'main' into hdbscan
Micky774 Mar 27, 2022
e7165ae
Removed extra properties/attributes
Micky774 Mar 27, 2022
7834bf1
Cleaned up function signatures
Micky774 Mar 27, 2022
4a4e3eb
Trimmed docstring, renamed param, removed extra parameters/attrs
Micky774 Mar 27, 2022
df65fb9
Moved single-use functions in-line
Micky774 Mar 27, 2022
4bd72e5
Trim cython file by removing functionality for old `prediction`
Micky774 Mar 27, 2022
dbd6ca5
Merge branch 'main' into hdbscan
Micky774 Apr 1, 2022
a25224f
Apply suggestions from code review
Micky774 Apr 1, 2022
95d95a1
Removed unnecessary `_prediction_utils` files
Micky774 Apr 1, 2022
cd83805
Renamed most `kwargs`-->`metric_params` for consistency
Micky774 Apr 1, 2022
dee1c46
Added clarifying comment in `_validity.py`
Micky774 Apr 1, 2022
2a81824
Added random state objects, and used `tmp_path` fixture
Micky774 Apr 1, 2022
add3617
Improved `badargs` test
Micky774 Apr 1, 2022
0bf1491
Minor wording change
Micky774 Apr 1, 2022
1f31960
Made docstrings more uniform and set default metric to `euclidean`
Micky774 Apr 1, 2022
e7291a8
Improved plotting w/ perturbation examples
Micky774 Apr 1, 2022
32f4d6e
Merge branch 'main' into hdbscan
Micky774 Apr 1, 2022
5d0489d
Merge branch 'main' into hdbscan
Micky774 Apr 2, 2022
b7aca9e
Updated clustering plots for gallery page rendering
Micky774 Apr 2, 2022
42fd546
Merge branch 'main' into hdbscan
Micky774 Apr 17, 2022
3d719d9
Improved plotting example
Micky774 Apr 17, 2022
daf1b2f
Updated User-Guide entry for new plotting example
Micky774 Apr 17, 2022
4ddaddf
Typo fix
Micky774 Apr 17, 2022
9e56fc0
Merge branch 'main' into hdbscan
Micky774 Apr 22, 2022
407a7bf
Merge branch 'main' into hdbscan
Micky774 Apr 23, 2022
fa1d30f
Applied plotting demo review feedback
Micky774 May 8, 2022
f593973
Merge branch 'main' into hdbscan
Micky774 May 8, 2022
8f7f60b
Streamlined and improved plotting demo per review feedback
Micky774 May 8, 2022
6c5f936
Removed default arg for labels
Micky774 May 8, 2022
e0daeb7
Removed `match_reference_implementation` arg
Micky774 May 8, 2022
a095bb9
Improved doc for `algorithm` and changed option `"best"`-->`"auto"`
Micky774 May 8, 2022
ca7e87f
Updated DOI reference and user guide images
Micky774 May 9, 2022
ffb7601
Merge branch 'main' into hdbscan
Micky774 May 30, 2022
bb0f768
Merge branch 'main' into hdbscan
Micky774 May 30, 2022
3b38777
Merge branch 'hdbscan' of https://github.com/Micky774/scikit-learn in…
Micky774 May 30, 2022
57ec680
Refactored parameter validation to use new API
Micky774 May 30, 2022
132c146
Adopted optics-like core_dist backend using `NearestNeighbors`
Micky774 Jun 15, 2022
cfaf597
Refactor of main hdbscan function
Micky774 Jun 15, 2022
400fcf1
Removed `approx_min_span_tree` -- defaulted to `True`
Micky774 Jun 15, 2022
44bb176
Removed unnecessary metric option
Micky774 Jun 15, 2022
6bd6146
Merge remote-tracking branch 'origin' into hdbscan
Micky774 Jun 22, 2022
ef4481e
Removed validity index, replaced w/ fowlkes-mallows score
Micky774 Jun 27, 2022
d7c449a
Minor cosmetic changes to tests
Micky774 Jun 27, 2022
3710209
Refactored boruvka cython
Micky774 Jun 27, 2022
bf571d9
Trimmed unnecessary mutual-reachability functions
Micky774 Jun 28, 2022
997b4cb
Comments and minor cosmetics
Micky774 Jun 28, 2022
6a7095c
Simplified tests wrt new validation mechanism
Micky774 Jun 28, 2022
54d71eb
Update doc/modules/clustering.rst
Micky774 Jun 29, 2022
9162f62
Improved user guide entry wording per review feedback
Micky774 Jun 29, 2022
f104ec9
Merge branch 'hdbscan' of https://github.com/Micky774/scikit-learn in…
Micky774 Jun 29, 2022
4f5f5d6
Improved testing coverage
Micky774 Jun 29, 2022
9abc237
Added initial changelog entry
Micky774 Jun 30, 2022
2d7c4c9
Added pr details in changelog entry
Micky774 Jun 30, 2022
4e37527
Merge branch 'hdbscan' of https://github.com/scikit-learn/scikit-lear…
Micky774 Jun 30, 2022
5e0bc41
Trimmed extra function and modified comments
Micky774 Jul 4, 2022
0847be5
Apply suggestions from code review
Micky774 Jul 26, 2022
1c9a76a
Applied isort (with black on top)
Micky774 Jul 26, 2022
c01a609
Stylistic improvements
Micky774 Aug 26, 2022
b7736ef
Removed boruvka algorithm
Micky774 Aug 26, 2022
84484ea
Refactored file names and setup file
Micky774 Aug 26, 2022
24c5b98
Updated test file for boruvka removal
Micky774 Aug 26, 2022
585d7bb
Added dtype specification to input array validation
Micky774 Aug 28, 2022
507f0da
Apply suggestions from code review
Micky774 Aug 31, 2022
ed6d17d
Further review feedback
Micky774 Aug 31, 2022
eefbacc
Refactored to remove `hdbscan` function -- use estimator instead
Micky774 Sep 6, 2022
3f89574
minor cleanup
Micky774 Sep 6, 2022
33f950b
Parameter simplification, and cluster_center refactor
Micky774 Sep 6, 2022
d29cc02
Minor typo corrections and reordering of user-guide entry
Micky774 Sep 6, 2022
3b86f1d
streamlined test
Micky774 Sep 6, 2022
cada149
Documentation update per review feedback
Micky774 Sep 6, 2022
45aab3c
Removed unnecessary function and made minor tweak to test
Micky774 Sep 6, 2022
67cab1a
Simplified plotting demo single-axis plots
Micky774 Sep 6, 2022
7edfd55
Refactored weighted centers
Micky774 Sep 14, 2022
d173707
Apply suggestions from code review
Micky774 Sep 14, 2022
1056cb0
Further review feedback implemented
Micky774 Sep 15, 2022
39b3e5a
Updated tests with review feedback
Micky774 Sep 15, 2022
5c42b0d
Apply suggestions from code review
Micky774 Sep 15, 2022
aa999f5
Renamed mst functions
Micky774 Sep 15, 2022
cf2c83d
Merge branch 'hdbscan' of https://github.com/Micky774/scikit-learn in…
Micky774 Sep 15, 2022
7c6c89d
Merge branch 'hdbscan' of https://github.com/scikit-learn/scikit-lear…
Micky774 Sep 15, 2022
4860a7f
Refactored _reachability.pyx
Micky774 Sep 15, 2022
2eff9cc
Adjusted documentation
Micky774 Sep 15, 2022
7a9b365
Cython cleanup for _reachability.pyx
Micky774 Sep 15, 2022
da44c83
Improved docs
Micky774 Sep 15, 2022
26dad21
Update sklearn/cluster/_hdbscan/hdbscan.py
Micky774 Sep 15, 2022
15595be
Minor cleanup
Micky774 Sep 15, 2022
a9a3c22
Merge branch 'hdbscan' of https://github.com/Micky774/scikit-learn in…
Micky774 Sep 15, 2022
0b0fa0e
Minor refactor for propogating missing data
Micky774 Sep 15, 2022
f96e8d6
Updated docs
Micky774 Sep 15, 2022
8f5c22b
Updated authorships
Micky774 Sep 15, 2022
23185f0
Updated `n_cluster` calc in `_weighted_cluster_center`
Micky774 Sep 15, 2022
8ed0869
Refactored brute algorithm and added `copy` parameter
Micky774 Sep 16, 2022
e6b9c2d
Updated tests a bit
Micky774 Sep 16, 2022
c31f463
Utilize shared `UnionFind` code
Micky774 Sep 16, 2022
f4cd003
Adjusted common test parameter
Micky774 Sep 16, 2022
52d2f09
Added one-sample error
Micky774 Sep 16, 2022
95c0705
Updated references
Micky774 Sep 19, 2022
b1446f7
Apply suggestions from code review
Micky774 Sep 26, 2022
fd5c5df
Moved test
Micky774 Sep 26, 2022
8a2be40
Updated setup.py
Micky774 Sep 26, 2022
5aca317
Apply suggestions from code review
Micky774 Sep 26, 2022
6abe276
Apply suggestions from code review
Micky774 Sep 26, 2022
e38d934
Incorporated review feedback
Micky774 Sep 27, 2022
e1436c2
Merge branch 'hdbscan' of https://github.com/Micky774/scikit-learn in…
Micky774 Sep 27, 2022
9b3d2e4
Lint
Micky774 Sep 27, 2022
841c5cd
Updated copy docstring and simplified behavior
Micky774 Oct 4, 2022
b6dd52a
Addressed feedback
Micky774 Oct 6, 2022
6b68706
Update sklearn/cluster/_hdbscan/hdbscan.py
Micky774 Oct 6, 2022
bd47ec8
Clarified comment
Micky774 Oct 6, 2022
b8e6da1
Implemented outlier encoding
Micky774 Oct 11, 2022
874f85c
Added space in txt for rendering
Micky774 Oct 11, 2022
220d1d2
blackify
glemaitre Oct 12, 2022
c720514
Merge branch 'hdbscan' into hdbscan
glemaitre Oct 12, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/modules/classes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ Classes
cluster.AgglomerativeClustering
cluster.Birch
cluster.DBSCAN
cluster.HDBSCAN
cluster.FeatureAgglomeration
cluster.KMeans
cluster.BisectingKMeans
Expand Down
115 changes: 114 additions & 1 deletion doc/modules/clustering.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,13 @@ Overview of clustering methods
transductive
- Distances between nearest points

* - :ref:`HDBSCAN <hdbscan>`
- minimum cluster membership, minimum point neighbors
- large ``n_samples``, medium ``n_clusters``
- Non-flat geometry, uneven cluster sizes, outlier removal,
transductive, hierarchical, variable cluster density
- Distances between nearest points

* - :ref:`OPTICS <optics>`
- minimum cluster membership
- Very large ``n_samples``, large ``n_clusters``
Expand Down Expand Up @@ -946,6 +953,112 @@ by black points below.
Schubert, E., Sander, J., Ester, M., Kriegel, H. P., & Xu, X. (2017).
In ACM Transactions on Database Systems (TODS), 42(3), 19.

.. _hdbscan:

HDBSCAN
=======

The :class:`HDBSCAN` algorithm can be seen as an extension of :class:`DBSCAN`
and :class:`OPTICS`. Specifically, :class:`DBSCAN` assumes that the clustering
criterion (i.e. density requirement) is *globally homogeneous*.
In other words, :class:`DBSCAN` may struggle to successfully capture clusters
with different densities.
:class:`HDBSCAN` alleviates this assumption and explores all possible density
scales by building an alternative representation of the clustering problem.

.. note::

This implementation is adapted from the original implementation of HDBSCAN,
`scikit-learn-contrib/hdbscan <https://github.com/scikit-learn-contrib/hdbscan>`_.

Mutual Reachability Graph
-------------------------

HDBSCAN first defines :math:`d_c(x_p)`, the *core distance* of a sample :math:`x_p`, as the
distance to its `min_samples` th-nearest neighbor, counting itself. For example,
if `min_samples=5` and :math:`x_*` is the 5th-nearest neighbor of :math:`x_p`
then the core distance is:

.. math:: d_c(x_p)=d(x_p, x_*).

Next it defines :math:`d_m(x_p, x_q)`, the *mutual reachability distance* of two points
:math:`x_p, x_q`, as:

.. math:: d_m(x_p, x_q) = \max\{d_c(x_p), d_c(x_q), d(x_p, x_q)\}

These two notions allow us to construct the *mutual reachability graph*
:math:`G_{ms}` defined for a fixed choice of `min_samples` by associating each
sample :math:`x_p` with a vertex of the graph, and thus edges between points
:math:`x_p, x_q` are the mutual reachability distance :math:`d_m(x_p, x_q)`
between them. We may build subsets of this graph, denoted as
:math:`G_{ms,\varepsilon}`, by removing any edges with value greater than :math:`\varepsilon`:
from the original graph. Any points whose core distance is less than :math:`\varepsilon`:
are at this staged marked as noise. The remaining points are then clustered by
finding the connected components of this trimmed graph.

.. note::

Taking the connected components of a trimmed graph :math:`G_{ms,\varepsilon}` is
equivalent to running DBSCAN* with `min_samples` and :math:`\varepsilon`. DBSCAN* is a
slightly modified version of DBSCAN mentioned in [CM2013]_.

Hierarchical Clustering
-----------------------
HDBSCAN can be seen as an algorithm which performs DBSCAN* clustering across all
values of :math:`\varepsilon`. As mentioned prior, this is equivalent to finding the connected
components of the mutual reachability graphs for all values of :math:`\varepsilon`. To do this
efficiently, HDBSCAN first extracts a minimum spanning tree (MST) from the fully
-connected mutual reachability graph, then greedily cuts the edges with highest
weight. An outline of the HDBSCAN algorithm is as follows:

1. Extract the MST of :math:`G_{ms}`
2. Extend the MST by adding a "self edge" for each vertex, with weight equal
to the core distance of the underlying sample.
3. Initialize a single cluster and label for the MST.
4. Remove the edge with the greatest weight from the MST (ties are
removed simultaneously).
5. Assign cluster labels to the connected components which contain the
end points of the now-removed edge. If the component does not have at least
one edge it is instead assigned a "null" label marking it as noise.
6. Repeat 4-5 until there are no more connected components.

HDBSCAN is therefore able to obtain all possible partitions achievable by
DBSCAN* for a fixed choice of `min_samples` in a hierarchical fashion.
Indeed, this allows HDBSCAN to perform clustering across multiple densities
and as such it no longer needs :math:`\varepsilon` to be given as a hyperparameter. Instead
it relies solely on the choice of `min_samples`, which tends to be a more robust
hyperparameter.

.. |hdbscan_ground_truth| image:: ../auto_examples/cluster/images/sphx_glr_plot_hdbscan_005.png
:target: ../auto_examples/cluster/plot_hdbscan.html
:scale: 75
.. |hdbscan_results| image:: ../auto_examples/cluster/images/sphx_glr_plot_hdbscan_007.png
:target: ../auto_examples/cluster/plot_hdbscan.html
:scale: 75

.. centered:: |hdbscan_ground_truth|
.. centered:: |hdbscan_results|

HDBSCAN can be smoothed with an additional hyperparameter `min_cluster_size`
which specifies that during the hierarchical clustering, components with fewer
than `minimum_cluster_size` many samples are considered noise. In practice, one
can set `minimum_cluster_size = min_samples` to couple the parameters and
simplify the hyperparameter space.

.. topic:: References:

.. [CM2013] Campello, R.J.G.B., Moulavi, D., Sander, J. (2013). Density-Based Clustering
Based on Hierarchical Density Estimates. In: Pei, J., Tseng, V.S., Cao, L.,
Motoda, H., Xu, G. (eds) Advances in Knowledge Discovery and Data Mining.
PAKDD 2013. Lecture Notes in Computer Science(), vol 7819. Springer, Berlin,
Heidelberg.
:doi:`Density-Based Clustering Based on Hierarchical Density Estimates <10.1007/978-3-642-37456-2_14>`

.. [LJ2017] L. McInnes and J. Healy, (2017). Accelerated Hierarchical Density Based
Clustering. In: IEEE International Conference on Data Mining Workshops (ICDMW),
2017, pp. 33-42.
:doi:`Accelerated Hierarchical Density Based Clustering <10.1109/ICDMW.2017.12>`

.. _optics:

OPTICS
Expand Down Expand Up @@ -1018,7 +1131,7 @@ represented as children of a larger parent cluster.
Different distance metrics can be supplied via the ``metric`` keyword.

For large datasets, similar (but not identical) results can be obtained via
`HDBSCAN <https://hdbscan.readthedocs.io>`_. The HDBSCAN implementation is
:class:`HDBSCAN`. The HDBSCAN implementation is
multithreaded, and has better algorithmic runtime complexity than OPTICS,
at the cost of worse memory scaling. For extremely large datasets that
exhaust system memory using HDBSCAN, OPTICS will maintain :math:`n` (as opposed
Expand Down
13 changes: 13 additions & 0 deletions doc/whats_new/v1.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,19 @@ Changelog
:mod:`sklearn.cluster`
......................

- |MajorFeature| Added :class:`cluster.HDBSCAN`, a modern hierarchical density-based
clustering algorithm. Similarly to :class:`cluster.OPTICS`, it can be seen as a
generalization of :class:`DBSCAN` by allowing for hierarchical instead of flat
clustering, however it varies in its approach from :class:`cluster.OPTICS`. This
algorithm is very robust with respect to its hyperparameters' values and can
be used on a wide variety of data without much, if any, tuning.

This implementation is an adaptation from the original implementation of HDBSCAN in
`scikit-learn-contrib/hdbscan <https://github.com/scikit-learn-contrib/hdbscan>`_,
by :user:`Leland McInnes <lmcinnes>` et al.

:pr:`22616` by :user:`Meekail Zain <micky774>`

- |Enhancement| The `predict` and `fit_predict` methods of :class:`cluster.OPTICS` now
accept sparse data type for input data. :pr:`14736` by :user:`Hunt Zhan <huntzhan>`,
:pr:`20802` by :user:`Brandon Pokorny <Clickedbigfoot>`,
Expand Down
9 changes: 9 additions & 0 deletions examples/cluster/plot_cluster_comparison.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,9 @@
"min_samples": 7,
"xi": 0.05,
"min_cluster_size": 0.1,
"allow_single_cluster": True,
"hdbscan_min_cluster_size": 15,
"hdbscan_min_samples": 3,
}

datasets = [
Expand Down Expand Up @@ -161,6 +164,11 @@
affinity="nearest_neighbors",
)
dbscan = cluster.DBSCAN(eps=params["eps"])
hdbscan = cluster.HDBSCAN(
min_samples=params["hdbscan_min_samples"],
min_cluster_size=params["hdbscan_min_cluster_size"],
allow_single_cluster=params["allow_single_cluster"],
)
optics = cluster.OPTICS(
min_samples=params["min_samples"],
xi=params["xi"],
Expand Down Expand Up @@ -188,6 +196,7 @@
("Ward", ward),
("Agglomerative\nClustering", average_linkage),
("DBSCAN", dbscan),
("HDBSCAN", hdbscan),
("OPTICS", optics),
("BIRCH", birch),
("Gaussian\nMixture", gmm),
Expand Down
Loading