DOC Minor updates to `OPTICS` docstring #31363

lucyleeow · 2025-05-14T04:10:18Z

Reference Issues/PRs

Small grammar fixes
Update type specification of cluster_method to string options allowed instead of just str
Use reference to scipy distance module

What does this implement/fix? Explain your changes.

Any other comments?

Noticed while reviewing #31102

github-actions · 2025-05-14T04:11:43Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: e9c9410. Link to the linter CI: here}

lucyleeow

Also some questions regarding the main description.

lucyleeow · 2025-05-14T04:13:32Z

sklearn/cluster/_optics.py

    neighborhood radius. Better suited for usage on large datasets than the
-    current sklearn implementation of DBSCAN.
+    current scikit-learn implementation of DBSCAN.

    Clusters are then extracted using a DBSCAN-like method


The "then" makes it seem like we should mention the order list e.g.,
"Clusters are then extracted from the order list using...", though not sure what the right term is here

going into that level of details will be hard, and it'll make this docstring very long, I'm not sure if we should do that here.

What about just removing the "then" then?

"Clusters are extracted using a DBSCAN-like method ... "

Though:

"Clusters are then extracted order list using a DBSCAN-like method"

isn't that much longer. Or are you saying that if we use the term "order list" we need to explain it further?

for me "then" here implies "as the next step", but sure, you're the native speaker here 😁

Yes, sorry I was meaning more that we could improve the documentation by including the info of what happens before. Or just removing the "then", so it's not confusing as it's not clear atm what the "before" is to the "then".

Sure, happy with that

Okay, I used the term 'cluster-order' as it seems to be the term used in the paper.

maybe you forgot to push?

Yes done now. Had to do something else in the middle 😬 .

lucyleeow · 2025-05-14T04:16:27Z

sklearn/cluster/_optics.py

    neighborhood radius. Better suited for usage on large datasets than the
-    current sklearn implementation of DBSCAN.
+    current scikit-learn implementation of DBSCAN.

    Clusters are then extracted using a DBSCAN-like method
    (cluster_method = 'dbscan') or an automatic


In the paragraph below (can't add this comment that far down), does the sentence:

" This implementation deviates from the original OPTICS by first performing
k-nearest-neighborhood searches on all points to identify core sizes, then
computing only the distances to unprocessed points when constructing the
cluster order."

mean that the original OPTICS algorithm does not only compute unprocessed points, or is it saying that the KNN part differs and this is just the step that follows (but does not differ from the original OPTICS)?

the original OPTICS algorithm proposed in the paper calculates core distance and reachability distances for all objects as a first step, which we don't. It's not easy to explain really, I had a hard time converting that to code.

I guess this comment is not for the average user, and rather for somebody picky who reads the paper and then comes looks at the implementation here. You could argue it could be a comment in the code maybe.

Ahh I think I understand the meaning now.

Does "unprocessed point" mean the same in both implementations? Does the original OPTICS algorithm re-compute reachability distances, updating if they are smaller? i.e., is the second part of the algorithm the same here as in the original paper?

Oh I'd have to read the paper and compare with implementation to be able to answer that, can't really confidently respond to this question easily

Yeah fair, I just looked at it and am still confused 🙃

Okay looking at the pseudo code from OPTICS, I think OPTICS will re-compute reachability distances, for neighbours in the 'heap', updating if smaller, which is the same as our implementation. Note that only the original object is set as 'processed' not the neighbors.

from OPTICS:

and our algorithm will also calculate reachability of all unprocessed neighbours, updating existing if smaller:

scikit-learn/sklearn/cluster/_optics.py

Lines 712 to 714 in d077f82

improved = np.where(rdists < np.take(reachability_, unproc))

reachability_[unproc[improved]] = rdists[improved]

predecessor_[unproc[improved]] = point_index

I've made some changes that may make it clearer, but also it's complicated and I would be happy to leave as is too.

lucyleeow · 2025-05-15T01:58:04Z

ping @adrinjalali as I looks like you wrote this originally

lucyleeow added 2 commits May 12, 2025 20:56

doc updates

1f18373

add back sentence

ee57017

github-actions bot added module:cluster Documentation labels May 14, 2025

lucyleeow commented May 14, 2025

View reviewed changes

This comment was marked as spam.

Sign in to view

lucyleeow added 2 commits May 21, 2025 15:24

Merge branch 'main' into doc_optics

c5d0ac3

amend docstring

e9c9410

adrinjalali approved these changes May 23, 2025

View reviewed changes

adrinjalali merged commit a2ceff3 into scikit-learn:main May 23, 2025
36 checks passed

lucyleeow deleted the doc_optics branch May 23, 2025 12:30

	improved = np.where(rdists < np.take(reachability_, unproc))
	reachability_[unproc[improved]] = rdists[improved]
	predecessor_[unproc[improved]] = point_index

Uh oh!

DOC Minor updates to OPTICS docstring #31363

DOC Minor updates to OPTICS docstring #31363

Uh oh!

Conversation

lucyleeow commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

lucyleeow left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucyleeow May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucyleeow May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucyleeow commented May 15, 2025

Uh oh!

This comment was marked as spam.

Uh oh!

Uh oh!

Uh oh!

DOC Minor updates to `OPTICS` docstring #31363

DOC Minor updates to `OPTICS` docstring #31363

lucyleeow commented May 14, 2025 •

edited

Loading

github-actions bot commented May 14, 2025 •

edited

Loading

lucyleeow May 21, 2025 •

edited

Loading

lucyleeow May 14, 2025 •

edited

Loading