-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
DOC Minor updates to OPTICS
docstring
#31363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also some questions regarding the main description.
neighborhood radius. Better suited for usage on large datasets than the | ||
current sklearn implementation of DBSCAN. | ||
current scikit-learn implementation of DBSCAN. | ||
|
||
Clusters are then extracted using a DBSCAN-like method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "then" makes it seem like we should mention the order list e.g.,
"Clusters are then extracted from the order list using...", though not sure what the right term is here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
going into that level of details will be hard, and it'll make this docstring very long, I'm not sure if we should do that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about just removing the "then" then?
"Clusters are extracted using a DBSCAN-like method ... "
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though:
"Clusters are then extracted order list using a DBSCAN-like method"
isn't that much longer. Or are you saying that if we use the term "order list" we need to explain it further?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for me "then" here implies "as the next step", but sure, you're the native speaker here 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I agree that "then" means "as the next step", I was trying to add the info of "next step after what"?
I know above we mention "Unlike DBSCAN, it keeps cluster hierarchy for a variable neighborhood radius", but then we talk about some other stuff, and it is not obvious what step happens before cluster extraction? i.e., what is the "then" following?
neighborhood radius. Better suited for usage on large datasets than the | ||
current sklearn implementation of DBSCAN. | ||
current scikit-learn implementation of DBSCAN. | ||
|
||
Clusters are then extracted using a DBSCAN-like method | ||
(cluster_method = 'dbscan') or an automatic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the paragraph below (can't add this comment that far down), does the sentence:
" This implementation deviates from the original OPTICS by first performing
k-nearest-neighborhood searches on all points to identify core sizes, then
computing only the distances to unprocessed points when constructing the
cluster order."
mean that the original OPTICS algorithm does not only compute unprocessed points, or is it saying that the KNN part differs and this is just the step that follows (but does not differ from the original OPTICS)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the original OPTICS algorithm proposed in the paper calculates core distance and reachability distances for all objects as a first step, which we don't. It's not easy to explain really, I had a hard time converting that to code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this comment is not for the average user, and rather for somebody picky who reads the paper and then comes looks at the implementation here. You could argue it could be a comment in the code maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh I think I understand the meaning now.
Does "unprocessed point" mean the same in both implementations? Does the original OPTICS algorithm re-compute reachability distances, updating if they are smaller? i.e., is the second part of the algorithm the same here as in the original paper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I'd have to read the paper and compare with implementation to be able to answer that, can't really confidently respond to this question easily
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah fair, I just looked at it and am still confused 🙃
ping @adrinjalali as I looks like you wrote this originally |
Reference Issues/PRs
cluster_method
to string options allowed instead of juststr
distance
moduleWhat does this implement/fix? Explain your changes.
Any other comments?
Noticed while reviewing #31102